moremore0812 / cqengine

Automatically exported from code.google.com/p/cqengine
0 stars 0 forks source link

Set.add semantics is not perfect. #26

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago

Collections.newSetFromMap() has a small problem in the way add() is handled,
if the element is already present by "equals()" semantics it will overwrite
it. Set.add() clearly says update only if not already present. This cannot
be done without extra cost unless the underlying map is ConcurrentMap,
because in ConcurrentMap we have putIfAbsent() operation to make this work.

I'm attaching a patch to fix this, feel free to take this. 

Original issue reported on code.google.com by anita.v...@inmobi.com on 8 Nov 2013 at 10:05

Attachments:

GoogleCodeExporter commented 9 years ago
As to why this is may be important to some people. It is possible that one of 
the Index works on a field that is not included in equals(). And this will 
cause Index to be not updated but set to be updated, which is an internally 
inconsistent state. We should try to minimize such issues.

Original comment by anita.v...@inmobi.com on 8 Nov 2013 at 10:08

GoogleCodeExporter commented 9 years ago
Hi Atul,

This is very interesting.

I've done some digging and I'm not sure that this is an issue. But it looks 
like java.util.Collections.newSetFromMap() relies on some very specific 
implementation details of Maps.

Based on JDK 8 sources anyway:
http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/f82b730c798b/src/share/classes/jav
a/util/Collections.java
Collections.newSetFromMap(), the add() method is as follows:
    public boolean add(E e) { return m.put(e, Boolean.TRUE) == null; }

..it delegates to the map.put(element, Boolean.TRUE) method of the underlying 
Map that it wraps. But the element being stored in the Set, is supplied to the 
Map as the key, not as the value.

http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/f82b730c798b/src/share/classes/jav
a/util/concurrent/ConcurrentHashMap.java
It looks like the internal putVal() method, is implemented such that even if 
put() was called multiple times with several key objects all being equal to 
each other, the first key object would be stored in a Node, and subsequent key 
objects would never replace it.

At that point I was thinking that these findings were just anecdotal, but then 
I also remembered that Map provides methods:
Set<K> keySet()
Collection<V> values()

...so the set of keys in a map might actually be required to implement the Set 
contract, which would indeed imply that adding a key a second time should not 
replace the existing key, making all Maps compatible with 
Collections.newSetFromMap().

It seems that all of this might hinge on a very subtle aspect of the Map 
contract, which isn't spelled out explicitly anywhere I can find.

Original comment by ni...@npgall.com on 9 Nov 2013 at 7:49

GoogleCodeExporter commented 9 years ago
Yes, you are right, I think it is better to throw away my patch than having 
more complex code. 

The other optimization it brings is addAll() is faster, because size() is not 
considered while checking it. It is possible size() is not O(1) operation, and 
sometimes even as bad as O(n) depending on ConcurrentMap implementation we are 
talking about.

There is another bug related to remove() and removeAll() which I'll raise on 
another issue. 

Original comment by anita.v...@inmobi.com on 9 Nov 2013 at 9:15

GoogleCodeExporter commented 9 years ago
Closing this as discussed. Thanks for reporting anyway, they more eyes on the 
code the better.

Original comment by ni...@npgall.com on 13 Nov 2013 at 11:33