softdevteam / diffract

A Rust implementation of the GumTree differ
Other
1 stars 1 forks source link

Reconsider isomorphic tests with hashing #63

Open snim2 opened 6 years ago

snim2 commented 6 years ago

The isomorphism tests based on hashes are a transliteration of the GT code (based on Chilowicz et al. 2011). Some hashes (based on tree structure) are stored within AST nodes, but these need to be recomputed whenever the AST changes. This could be made more efficient, for example by storing version numbers in each node, and whenever a node changes incrementing its version and that of its ancestors. This would indicate which hashes need to be recomputed and which do not.

ltratt commented 6 years ago

I should probably add one thing: my suggestion of version numbers is based on an assumption that versions won't change very often. In other words, if you're continually comparing N_1 (object N, version 1) against N'_1 then caching makes sense; but if you're comparing N_1 successively against N'_1, N'_2, ... N'_m then the extra memory and machinery will be a net slowdown (since each new N' won't be in the comparison cache and will require a complete check). So it's probably going to require a bit of benchmarking to work out whether this is even worth considering.