whitequark / ast

A library for working with Abstract Syntax Trees.
MIT License
195 stars 25 forks source link

Add Node#eql? and precompute compatible #hash #3

Closed mbj closed 10 years ago

mbj commented 10 years ago

This branch adds #eql? and #hash to AST::Node. Before this change only identical objects would take the same slot in a Hash or a Set. This change makes it easier for me to deduplicate mutations in mutant. I expect other consumers of this library will also benefit from value object semantics of AST::Node.

I do not need to map to source for deduplication anymore. This is not only significantly faster, it also allows me to spot cases where mutant emitted invalid ASTs before. Unparser does not validate the AST and will silently do whatever the AST contains, this could before result in the situartion where the AST itself was invalid but unparsed to valid source, that got itself parsed into valid AST again.

I decided to precompute the #hash in the initializer before the object gets frozen. I can measure an additional nice speed effect in my use cases. The mutant corpus tests with cached #hash against rubyspec runs around 20% faster.

Please note I only include @type and @children from instance state to calculate the #hash and to test #eql?. This is in-line with your implementation of #== that ignores the additional metadata properties in ivars. And exactly how I need it for my use-case.

Possible controversial topic:

The definition of #hash and #eql? include a test for the exact instances class. I prefer to not consider subclasses as equilvalent in my code. I'm happy to drop the inclusion of the instances class into #eql? and #hash to get this PR accepted and released.

def eql?(other)
  self.class.eql?(other.class)     && # controversial step I'd be fine to replace with self.class.kind_of?(other.class)
  @type.eql?(other.type)           &&
  @children.eql?(other.children)
end

def initialize
   # ...
  @hash = @type.hash ^ @children.hash ^ self.class.hash # Lat XORed component I'm fine to drop.
   # ...

I have a backup plan in case this PR intentions are totally unacceptable. I'll wrap each AST into a container that precomputes #hash. But I first try to push it down with this PR.

whitequark commented 10 years ago

This PR is a perfectly fine solution and in fact I was going to do it myself at some point (but I guess I forgot). Note that it would incur a major version bump.

mbj commented 10 years ago

@whitequark I'm okay with a major version bump in case you release a new parser with that new dependency. ast itself is for most of the people I know, only a transitive dependency.