make hashing alpha-equivalence-respecting

paulstansifer commented 9 years ago

Robby says: "One thing to also look into for Redex: in addition to an alpha-sensitive equality, we also need an alpha-sensitive hashing. I think it's straightforward: just normalize and then call the equal-based hash function that's built into Redex.

We'll need this for traces (the way it collapses different paths to the same term is based on hashing)."

paulstansifer commented 9 years ago

Implement an alpha-respecting hash function; figuring out where to use it comes later. Look at equal-hash-code(and look at the second one)

sznurek commented 8 years ago

@paulstansifer: Have there been any progress on this issue? I need this for my own work and I am happy to fix this bug if there isn't a fix already.

It seems that alpha-respecting hash function already exists in binding-forms.rkt: the α-equal-hash-code function. Replacing snip-cache hash with custom hash based on the mentioned function should do the trick, I think.

paulstansifer commented 8 years ago

Please do! Making α-equal-hash-code was easy, but I never figured out what to do with it.

sznurek commented 8 years ago

I am slightly confused: I have defined a language L that contains lambda calculus. I have tested α-equal? function:

> (α-equal? (compiled-lang-binding-table L) match-pattern (term (λ x x)) (term (λ y y)))
#t

On the other hand, α-equal-hash-code returns different values for alpha-equivalent terms:

> (α-equal-hash-code (compiled-lang-binding-table L) match-pattern (term (λ x x)))
122630835892628
> (α-equal-hash-code (compiled-lang-binding-table L) match-pattern (term (λ y y)))
122635447880559

Because of that, custom hash is not working as expected:

> (define h (make-custom-hash (lambda (x y) (α-equal? (compiled-lang-binding-table L) match-pattern x y))))
> (dict-count h)
0
> (dict-set! h (term (λ x x)) 1)
> (dict-count h)
1
> (dict-ref h (term (λ x x)))
1
> (dict-set! h (term (λ y y)) 2)
> (dict-count h)
1
> (dict-ref h (term (λ x x)))
2
> (define h (make-custom-hash (lambda (x y) (α-equal? (compiled-lang-binding-table L) match-pattern x y)) (lambda (x) (α-equal-hash-code (compiled-lang-binding-table L) match-pattern x))))
> (dict-set! h (term (λ x x)) 1)
> (dict-count h)
1
> (dict-set! h (term (λ y y)) 2)
> (dict-count h)
2 ; should not happen!

Is this intended? Am I doing something wrong? For the reference, definition of the L language can be found in my public repository: https://bitbucket.org/sznurek/lambda-mu-redex/src/e8d14810119e147199947cfc9cab34015b505c08/lambda-mu.rkt?fileviewer=file-view-default#lambda-mu.rkt-4

paulstansifer commented 8 years ago

That sounds like a bug in my hashing code. I'll take a look at it.

paulstansifer commented 8 years ago

I tried to reproduce your situation exactly, but I got the same hash code for both: https://gist.github.com/paulstansifer/00c1a0e3de08a16dbe93

When I run it, I get

#t
117428840964850
117428840964850

Could you send me a complete file to look at?

Another thing that might be helpful would be to modify binding-forms.rkt to export canonicalize and report what the result of calling canonicalize instead of α-equal-hash-code is (it takes the same arguments in the same order).

sznurek commented 8 years ago

I have played with this code inside DrRacket's REPL. I have run (CTRL-R) the code with language definition, then executed the following:

> (require redex/private/matcher)
> (require redex/private/binding-forms-compiler)
> (require redex/private/term-fn)
> (require redex/private/lang-struct)
> (require redex/private/binding-forms)
> (α-equal-hash-code (compiled-lang-binding-table L) match-pattern (term (λ x x)))
119312564214743
> (α-equal-hash-code (compiled-lang-binding-table L) match-pattern (term (λ y y)))
119365168470767

Anyway, when I run the code you have attached from the shell (racket ~/Code/reproduction-attempt.rkt) I get correct results. Thank you!

paulstansifer commented 8 years ago

This is so weird. In DrRacket (with binding-forms.rkt modified to export canonicalize and to print the canonicalization when calculating hash codes):

> (α-equal-hash-code (compiled-lang-binding-table L) match-pattern (term (λ x x)))
==(λ |1| |1|)==
120218973210905
> (α-equal-hash-code (compiled-lang-binding-table L) match-pattern (term (λ y y)))
==(λ |1| |1|)==
120242422939934

...so it looks like equal-hash-code is looking at something "invisible" in those canonicalizations...

> (define cx (canonicalize (compiled-lang-binding-table L) match-pattern (term (λ x x))))
> (define cy (canonicalize (compiled-lang-binding-table L) match-pattern (term (λ y y))))
> cx
'(λ |1| |1|)
> cy
'(λ |1| |1|)
> (equal? cx cy)
#t
> (equal-hash-code cx)
120248800697966
> (equal-hash-code cy)
120248800697966

...but storing the canonicalization seems to make the problem go away (!?). This makes no sense, given that α-equal-hash-code does nothing but call equal-hash-code on the output of canonicalize.

Maybe there's a bug in equal-hash-code?

paulstansifer commented 8 years ago

Okay, this is definitely a problem in equal-hash-code:

> (equal-hash-code (string->symbol "1"))
1165559
> (equal-hash-code (string->symbol "1"))
1165693

paulstansifer commented 8 years ago

Reported a bug against Racket: http://bugs.racket-lang.org/query/?cmd=view&pr=15262

paulstansifer commented 8 years ago

Looks like whether this is a bug in Racket is debatable. I found an unrelated flaw in canonicalization, so I fixed it and this problem with one stroke; should be visible in the latest master (156bbd1).

sznurek commented 8 years ago

I have created pull request with my proposed solution - happy to get feedback on this!

paulstansifer / redex

make hashing alpha-equivalence-respecting #13