pnathan / generic-comparability

implementation of cdr-8
http://cdr.eurolisp.org/document/8/cleqcmp.html
16 stars 2 forks source link

On hash tables, key insertion order matters, unlike equalp #5

Open anticrisis opened 6 years ago

anticrisis commented 6 years ago

Unlike the equalp operator, the order in which keys are inserted seems to affect equals. Is this desired behavior or a bug?

CL-USER> (defvar a (make-hash-table))
A
CL-USER> (defvar b (make-hash-table))
B
CL-USER> (setf (gethash :a a) 1)
1
CL-USER> (setf (gethash :b a) 2)
2
CL-USER> (setf (gethash :b b) 2)
2
CL-USER> (setf (gethash :a b) 1)
1
CL-USER> (equalp a b)
T
CL-USER> (generic-comparability:equals a b)
NIL

This is on SBCL 1.3.21.

pnathan commented 6 years ago

Off the top of my head, BUG.

On Sep 4, 2017 1:10 AM, "anticrisis" notifications@github.com wrote:

Unlike the equalp operator, the order in which keys are inserted seems to affect equals. Is this desired behavior or a bug?

CL-USER> (defvar a (make-hash-table)) A CL-USER> (defvar b (make-hash-table)) B CL-USER> (setf (gethash :a a) 1) 1 CL-USER> (setf (gethash :b a) 2) 2 CL-USER> (setf (gethash :b b) 2) 2 CL-USER> (setf (gethash :a b) 1) 1 CL-USER> (equalp a b) T CL-USER> (generic-comparability:equals a b) NIL

This is on SBCL 1.3.21.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pnathan/generic-comparability/issues/5, or mute the thread https://github.com/notifications/unsubscribe-auth/AAuc4Eqeg9KjAgy95aUVV1yNucr8BIxoks5se7CIgaJpZM4PLlkV .

anticrisis commented 6 years ago

Seems the problem is here: https://github.com/pnathan/generic-comparability/blob/9ad75d4be22de86efabcd8522d9f5232033aad8b/generic-comparability.lisp#L177-L184

(hash-table-keys) will return keys in an unspecified order. I haven't looked at equalp, but I imagine a correct implementation would iterate through the keys of the first hash table and query the second hash table on those keys.

pnathan commented 6 years ago

Yes, you're right; the code assumes hash-table-keys is returning in a consistent order.

I can't promise anything in terms of a patch, I have a heap of things to do at home. But I will definitely try to get to it soon....

N.b., an ineffecient way to do this would be to sort the keys as they come back from both, then compare. That would be some 2 * n lg n, as opposed to your conception, which would be n.

On Mon, Sep 4, 2017 at 4:07 PM, anticrisis notifications@github.com wrote:

Seems the problem is here: https://github.com/pnathan/ generic-comparability/blob/9ad75d4be22de86efabcd8522d9f52 32033aad8b/generic-comparability.lisp#L177-L184

(hash-table-keys) will return keys in an unspecified order. I haven't looked at equalp, but I imagine a correct implementation would iterate through the keys of the first hash table and query the second hash table on those keys.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pnathan/generic-comparability/issues/5#issuecomment-327037339, or mute the thread https://github.com/notifications/unsubscribe-auth/AAuc4D1YFflFxqgKYz54PjZ9JtcnOf2Xks5sfIKhgaJpZM4PLlkV .

anticrisis commented 6 years ago

In reading CDR-8, I think there's a conflict between its description of EQUALS for hash tables, and hash-tables's use of an equivalence test function. If by-key is T, CDR-8 overrules the hash table's test function for comparing keys.

For example, if the hash tables use EQUALP as their equivalence test, the keys "a" and "A" would be equivalent. But under EQUALS, "a" and "A" are not equivalent. Therefore, EQUALS would return NIL, but an iteration of every key and value in the first hash table, and a GETHASH test of those keys in the second table, would return T.

This inconsistency makes my proposed implementation incorrect, since it would rely on GETHASH to query the second hash table for the presence of each key in the first hash table. GETHASH, of course, cannot use EQUALS, because user-defined test hash table test functions are not part of the standard.

So it seems that if by-key is T, the only correct implementation of CDR-8 is to sort the keys in the first table (with COMPARE), and then check for their existence in the second table.

If by-key is NIL, I assume we should use the hash-table's equivalence test? CDR-8 is silent on this point.

The algorithm in CDR-8 seems incomplete, in that it neglects to confirm that both hash tables use the same equivalence test, unless check-properties is T. But if the hash tables don't use the same equivalence test, then isn't it meaningless to try to assert by-value equality?