scala / bug

Scala 2 bug reports only. Please, no questions — proper bug reports only.
https://scala-lang.org
232 stars 21 forks source link

Scala 2.12 map.keySet leaks reference to map #11727

Closed fanf closed 9 months ago

fanf commented 5 years ago

Hello, we discovered with prod pain and crashes that Map.keySet leaks a reference to the Map object so that it is not GC'ed and leaks memory - huge memory if objects in the map are big and you wanted to only return light identifiers to them.

It can be seen for ex here: https://gist.github.com/fanf/ff39065f7f9738467e40929ad7490da1

Is this known/wanted and I'm just 13 years late in that knowledge, or is it a bug (at least a bad documentation)?

The problem is likely the references to self in DefaultKeySet (contains, size...) https://github.com/scala/scala/blob/2.12.x/src/library/scala/collection/MapLike.scala#L176

plokhotnyuk commented 5 years ago

@fanf it looks that DefaultKeySet was implemented like a view (not a separated collection of keys)...

szeiger commented 5 years ago

I think it makes sense that keySet references the source map because it is usually the fastest and most straight-forward implementation.. But you wouldn't expect so from documentation such as

Collects all keys of this map in an iterable collection.

fanf commented 5 years ago

Yes, it would be very appreciated to make obvious the fact that it's a view and not a new object, AND what is the best (most efficient) method to get a fresh, independant Set. For reference, the naive Set()++map.keySet() we tested first does not work, most likelly because ++ is smart and see that one of the set is empty and just returns the second one.

fanf commented 5 years ago

(I knew about values which is never what you want it to be, keySet is a new one :)

SethTisue commented 5 years ago

best (most efficient) method to get a fresh, independant Set.

.keysIterator.toSet?

fanf commented 5 years ago

@SethTisue most likely from my user point of view, but I don't know if there's better way :)

plokhotnyuk commented 5 years ago

Both keySet.toSeq.toSet and keysIterator.toSet methods for getting a separated set are vulnerable for hash collision based DoS attacks even if the original map (like scala.collection.mutable.CollisionProofHashMap or java.collection.LinkedHashMap) is safe: https://github.com/scala/bug/issues/11203

SethTisue commented 1 year ago

@scala/collections is 2.13 also affected? (if it's 2.12-only we should close the ticket)

som-snytt commented 1 year ago

2.13 is similarly a view onto the enclosing Map:

  /** A generic trait that is reused by keyset implementations */
  protected trait GenKeySet { this: Set[K] =>
    def iterator: Iterator[K] = MapOps.this.keysIterator
    def contains(key: K): Boolean = MapOps.this.contains(key)
    override def size: Int = MapOps.this.size
    override def knownSize: Int = MapOps.this.knownSize
    override def isEmpty: Boolean = MapOps.this.isEmpty
  }

The 2.13 nomenclature dictates that keySet should be a view and keySetted should be a copy. Good joke alert!

Agree that a tweak to the Scaladoc is deemed advisable.

som-snytt commented 1 year ago

Today's complaint in chat was that keys is an Iterable of no known properties.

Generally, keysIterator uses the map's iterator, keySet is a special set that uses keysIterator, and keys is just your keySet.

ListMap has a test expecting keys.map(f) to preserve order (insertion order in this case). However, I would expect to require keys.iterator.map(f). (It looks like sorted maps try to preserve the sort order of their keys.)

fommil commented 1 year ago

heh, I hit this problem this week. The side effect was that I thought I'd discovered a new way to add 4 bit numbers together that used a lot less diodes and transistors than the usual encoding. Suffice to say, my excitement has been suitably squashed upon discovery that I was accidentally deduplicating before applying my cost function... le sigh.

som-snytt commented 1 year ago

To quote Jesus from "The Chosen" tv show, "In this world, bones will break, and hearts will break." He goes on to say that in Scala 4, matters are much improved.