scala / bug

Scala 2 bug reports only. Please, no questions — proper bug reports only.
https://scala-lang.org
230 stars 21 forks source link

hashCode for NumericRange is computed by evaluating all values #12922

Closed AvaPL closed 5 months ago

AvaPL commented 6 months ago

Reproduction steps

Scala version: 2.13.10, but the problem is also present in other versions

val largeRangeHashCode = (Long.MinValue to Long.MaxValue).hashCode

Problem

java.lang.IllegalArgumentException: More than Int.MaxValue elements.
  at scala.collection.immutable.NumericRange$.check$1(NumericRange.scala:330)
  at scala.collection.immutable.NumericRange$.count(NumericRange.scala:361)
  at scala.collection.immutable.NumericRange.length$lzycompute(NumericRange.scala:75)
  at scala.collection.immutable.NumericRange.length(NumericRange.scala:75)
  at scala.util.hashing.MurmurHash3.indexedSeqHash(MurmurHash3.scala:239)
  at scala.util.hashing.MurmurHash3$.seqHash(MurmurHash3.scala:354)
  at scala.collection.immutable.NumericRange.hashCode$lzycompute(NumericRange.scala:246)
  at scala.collection.immutable.NumericRange.hashCode(NumericRange.scala:246)
  ... 32 elided

The hash code of a numeric range should be calculated properly. It seems that NumericRange uses MurmurHash3.indexedSeqHash which evaluates all values in the range to compute the hash code.

Additionally, even when the range is small enough, the hash code computation is ineffective and may affect the code performance e.g. when a Set[NumericRange[Long]] is used.

SethTisue commented 6 months ago

@scala/collections

jxnu-liguobin commented 6 months ago

Scala3 is also the same

som-snytt commented 6 months ago

Personally, I see nothing terrible about a library ticket on the dotty side. https://github.com/lampepfl/dotty/issues/19217

Since this repo is not scala/scala, the ticket has no natural home on github.

Also, is it not possible to transfer tickets?

And what would be the consequence of opening scala/scala issues for future maintenance tickets?

The template for scala/scala tickets would have a title of the form ... and nobody noticed until now.

Ichoran commented 6 months ago

I'm not sure this is feasible. Collections who are computing their hashcodes actually check whether they're traversing a range, and if they are, they throw away the sequence-computed hashcode and pick the one that matches Range instead.

But to do that for every type supported by NumericRange is not possible; it might not even be the same depending on what Numerics are in scope.

So there isn't any shortcut that will ensure that NumericRange has a hashcode match when equals does, and because it agrees with sequences with equality (it is part of Scala collections), it has to agree with their hashcodes, too.

You could do a shortcut to Range if and only if the start, end, and step match. That would help some, if it's not already there. And I guess if you can't match a sequence because you're too big, you could compute the hash some other way. But for big, in-range stuff like 1000000000000L to 2000000000000L by 1000L, you're obligated to take all billion and one steps.

And since you are so obligated, it's not totally clear to me that it's worth fixing the other stuff.

GerretS commented 6 months ago

Please note this PR on NumericRange, where I attempt to prevent the need of iterating over the whole range for the indexOf call.

https://github.com/scala/scala/pull/10627

What is especially interesting in regard to this issue is my finding that NumericRange allows you to make a range of more than Int.MaxValue elements but almost every method in the class throws that same IllegalArgument exception if you try to call it on a range that large.

A better longterm solution might be to simply accept that this class really doesn't support more than Int.MaxValue elements and evaluate this on instantiation by making length non-lazy.

SethTisue commented 5 months ago

I'm going to provisionally close this as "not planned", since @Ichoran's argument that we can't shortcut this seems persuasive to me. Happy to reopen if someone can convincingly argue otherwise.