timtadh / zhang-shasha

Tree edit distance using the Zhang Shasha algorithm
Other
433 stars 63 forks source link

Recording path #24

Closed bkj closed 9 years ago

bkj commented 9 years ago

Is it possible to record the path of edits from one tree to another? I'd like to be able to have a list of the operations that transforms tree A into tree B. Is this included in the module at the moment, and if not could you point me towards a resource that might describe how to implement it?

Thanks! Ben

timtadh commented 9 years ago

Hi Ben,

I have not had time to implement this feature as I have not needed it. It should be relatively straight forward to add path reconstruction, as is done for string edit distance, by recording the parent state on each modification. I have no current plans to implement path reconstruction as I have no need for it myself. However, I will gladly merge a PR.

-Tim

timtadh commented 9 years ago

you may find http://www.inf.unibz.it/dis/projects/tree-edit-distance/ useful.

bkj commented 9 years ago

Thanks for the quick reply. On second thought, the algorithm might not be appropriate for my application. I was trying to come up with distance metric on HTML DOM pages, but I see now that this algorithm only applies to ordered labeled trees. In a DOM tree, you can't assume that the names of the nodes are unique, as you might have something like

...
...

~ Ben

On Wed, Sep 30, 2015 at 4:35 PM, Tim Henderson notifications@github.com wrote:

Hi Ben,

I have not had time to implement this feature as I have not needed it. It should be relatively straight forward to add path reconstruction, as is done for string edit distance, by recording the parent state on each modification. I have no current plans to implement path reconstruction as I have no need for it myself. However, I will gladly merge a PR.

-Tim

— Reply to this email directly or view it on GitHub https://github.com/timtadh/zhang-shasha/issues/24#issuecomment-144535416 .

timtadh commented 9 years ago

Hi Ben,

Tree Edit Distance still applies. The labels do not need to be uniquely labelled, indeed they can all have the same label. The ordering means that there is an assumed order for the children of the node. This is usually easily achieved even in unordered trees.

-Tim

bkj commented 9 years ago

You mean by constructing some arbitrary order as long as it's consistent? (alphabetic by name, number of children, etc)?

timtadh commented 9 years ago

Yes. But that is not an issue for you as DOM trees are ordered labeled trees. The ordering is by appearance in the html document.

Cheers Tim Henderson Sent from my phone On Sep 30, 2015 4:53 PM, "bkj" notifications@github.com wrote:

You mean by constructing some arbitrary order as long as it's consistent? (alphabetic by name, number of children, etc)?

— Reply to this email directly or view it on GitHub https://github.com/timtadh/zhang-shasha/issues/24#issuecomment-144539605 .

bkj commented 9 years ago

Gotcha -- thanks for your comments!

timtadh commented 9 years ago

ok. I am going to close this issue but feel free to open a PR or another issue if you have further problems.