vigna / fastutil

fastutil extends the Java™ Collections Framework by providing type-specific maps, sets, lists and queues.
Apache License 2.0
1.76k stars 196 forks source link

AbstractMap#putAll() can cause issues if the source map is modified concurrently #257

Closed DaMatrix closed 3 years ago

DaMatrix commented 3 years ago

the implementation of AbstractType2TypeMap#putAll() does not work properly if the source map can be modified concurrently.

current code (taken from AbstractObject2ObjectMap:

public void putAll(Map<? extends K, ? extends V> m) {
    int n = m.size();
    final Iterator<? extends Map.Entry<? extends K, ? extends V>> i = m.entrySet().iterator();
    if (m instanceof Object2ObjectMap) {
        Object2ObjectMap.Entry<? extends K, ? extends V> e;
        while (n-- != 0) {
            e = (Object2ObjectMap.Entry<? extends K, ? extends V>) i.next();
            put(e.getKey(), e.getValue());
        }
    } else {
        Map.Entry<? extends K, ? extends V> e;
        while (n-- != 0) {
            e = i.next();
            put(e.getKey(), e.getValue());
        }
    }
}

if entries are added to m during iteration, we could end up with some entries being skipped. the major problem is that if entries are removed, iteration will continue past the end of the map, throwing a NoSuchElementException, as the value of n no longer reflects the current size of the map.

this could be resolved by adding a special case for when m instanceof ConcurrentMap, which would check Iterator#hasNext() rather than counting the number of entries processed so far.

incaseoftrouble commented 3 years ago

Things to consider:

Officially, it is working as intended. Quoting Java doc on putAll: "The behavior of this operation is undefined if the specified map is modified while the operation is in progress."

If m is concurrent, m is indeed thread safe. But not others accessing it (such as the map which currently is putting m). If you would remove from m while putAll is called, you have no guarantees whether the element you removed has been added or not, you are already bound for huge problems.

IIRC, you would typically address this by synchronizing on m before putAll, i.e.

synchronized (m) {
  map.putAll(m);
}

Contrary to usual best-practice, collections typically synchronize on themselves.

If this is what you intend, it might be better to manually put the contents of m into map via iteration. The contract on putAll is understood as "atomic" I think, i.e. "after this operation, all objects which are in m at the time of calling will be in map" which is not true if the method is changed as you propose.

So, this is a debate on whether you want fail-fast (as usual maps do it) or fail-safe (as concurrent map does it). I think fast-util is "fail-fast when it is cheap to detect".

DaMatrix commented 3 years ago

ah, i hadn't seen the docs on Map#putAll (only AbstractObject2ObjectMap#putAll, which doesn't inherit docs from Map). i guess i'll close this issue since it's working as intended.

that said, your suggestion of synchronizing on the map itself would have no effect unless every other modification to the map were also synchronized (which would negate the entire purpose of using a concurrent map in the first place). my solution has been to do

m.forEach(map::put);

as i don't need any atomicity guarantees in my case.

incaseoftrouble commented 3 years ago

that said, your suggestion of synchronizing on the map itself would have no effect unless every other modification to the map were also synchronized (which would negate the entire purpose of using a concurrent map in the first place).

Jeah, oops 🤦 I somehow mentally replaced ConcurrentMap with synchronized map, my bad. Didn't have my coffee yet.

m.forEach(map::put);

This should also have practically the same runtime in this case :-)