mastodon-sc / mastodon-deep-lineage

A collection of plugins to analyse lineages of tracked objects in Mastodon
BSD 2-Clause "Simplified" License
5 stars 1 forks source link

Implementation of Zhang Tree Edit Distance for unordered trees #24

Closed stefanhahmann closed 1 year ago

stefanhahmann commented 1 year ago

Story

Users would like to be able to determine the similarity of a sub-tree of a lineage tree with other sub-trees of the same or a different lineage tree. For this purpose, it is needed to have an algorithm implemented that computes the similarity between parts of lineage trees based on the tree similarity algorithm between unordered labeled trees.

Acceptance criteria

  1. An function exists that creates a view on a ModelBranchGraph based on a (root-)branchSpot and an endtimepoint
    • This view should itself be a ReadOnlyGraph<BranchSpot, BranchLink>
  2. A function that can take the follwing inputs:
    • ReadOnlyGraph<BranchSpot, BranchLink>
    • ReadOnlyGraph<BranchSpot, BranchLink>
    • Cost function
  3. The function returns the resulting tree similarity value
  4. The tree similarity function can take a cost function
  5. The cost function can take a List of attributes into account
  6. The cost function for the lifespan exists as an example cost function

Tasks

Algorithm:

Context

The similarity measurement shall be used to perform hierarchical clustering for all sub-trees of the root-nodes in a lineage tree. User may enter the starting timepoint and the ending timepoint and the number of classes that should be produced. The result should be available as tag set.

Findings from Tests

Additional Information

maarzt commented 1 year ago

Here is an attempt to visualize the edit operations that are allowed by the Zhang unordered tree edit distance. As ascii art.

1. Change label

      A         A'
     / \  ->   / \
    TB TC     TB TC                            

2a: Remove subtree

      A         A                                               
     / \   ->   |                                                    
    TB TC       TB

2b: Add new subtree

      A          A
      |    ->   / \
      TB       TB TC

3a: Remove subtree but keep one child

      A          A
     / \   ->   / \
    B  TC      TD TC
   / \
  TD TE        (remove B and TE, keep TD)

3b: Convert existing subtree into child of a newly inserted subtree
      A             A
     / \    ->     / \
    TB TC         D  TC
                 / \
        TB TE       (insert D and TE, keep TD)

4a: Remove subtree (and siblings) but keep all children
      A               A
     / \             / \
    B  TC   ->      TD TE
   / \
  TD TE            (Subtree B and it's sibling TE are removed, but the children
                    of B namely TD and TE are kept)

4b: Opposite of 4a

Maybe this could become part of the javadoc?