scala / bug

Scala 2 bug reports only. Please, no questions — proper bug reports only.
https://scala-lang.org
232 stars 21 forks source link

.toMap seems to be significantly slower than .to(mutable.Map) #12400

Closed pshirshov closed 3 years ago

pshirshov commented 3 years ago

I have some chunk of code which looks like:

            val r =
              (mv.offsets(row) until mv.offsets(row) + mv.lengths(
                row
              ))
                .map { i =>
                  val ii = i.intValue()
                  val name =
                    new String(kv.vector(ii), kv.start(ii), kv.length(ii))
                  val value = vv.vector(ii)
                  (name, value)

                }
                .toMap

           evaluateJsPolyglot(r, expr1)
           evaluateJsPolyglot(r, expr2)

This snippet is being invoked about 40 million times and my test takes about 8 minutes on my machine. When I replace .toMap with .to(mutable.Map) the test takes just 4 minutes.

So, I'm building one such map per two javascript expression evalutations (with graal polyglot). I believe js evaluation should be orders of magnitude slower than map creation but in fact map creations account for HALF of the CPU time.

This is counterintuitive and, I believe that shouldn't be the case at all. Why does it happen?.. Can this behavior be improved?..

I didn't make a repro yet nor tried to actually profile this, but I hope it shouldn't be necessary.

From what I can see in Map#addOne, Growable#addAll and Map#from, toMap uses a mutable buffer, so something seems to be odd here.

pshirshov commented 3 years ago

Well, .toMap uses HAMT, not a simple hashtable, in order to optimize .update on immutable collections. I've quickly checked several hundred thousands of Scala LoCs and seems like .update and ++= are rare operations. Like 98/100 immutable maps are being built once and never updated. So in fact current design inflicts enormous performance penalty for the sake of cornercase optimization. I believe default behavior should be different.

SethTisue commented 3 years ago

thoughts @scala/collections ?

RubenVerg commented 3 years ago

I've quickly checked several hundred thousands of Scala LoCs and seems like .update and ++= are rare operations.

Same here (although less than 100k LoCs)

martijnhoekstra commented 3 years ago

For 2.13, I think it's a bad idea to change any specific performance characteristic to something much worse, even if the trade-offs would have been the right choice at the time.

Whether it is the right choice for a later version or not is debatable. If you want a Map[A, B] that you're never going to update, maybe a PartialFunction[A, B] would be better. That could be a separate operation toLookup. The straightforward implementation could just be the upcast mutable map. I fear somewhat for the discoverability of that though.

johnynek commented 3 years ago

I have often made an immutable Map wrapper on a mutable map for exactly this reason.

Maybe toMap should return something like that.

NthPortal commented 3 years ago

the update method used in toMap is on HashMapBuilder, not HashMap, and in fact mutates the structure that's being built. the builder was heavily optimised in scala/scala#7118. my intuition says that it is unlikely the performance can be improved, and that the slowdown is just due to inherent differences in the design and structure.

that said, I will summon @joshlemer @mkeskells and @retronym, who worked on optimising the builder in 2.13 and 2.12

joshlemer commented 3 years ago

@NthPortal I think what @pshirshov is suggesting is not that immutable.Map can be optimized further, but that its benefit over a hashtable implementation is only that it supports faster non-destructive updated/removed/++/-- which they are claiming to be niche operations.

I think what they're suggesting is that the default immutable.Map implementation should be one that is an immutable wrapper on the hashtable that mutable.HashMap uses.

pshirshov commented 3 years ago

@NthPortal I think what @pshirshov is suggesting is not that immutable.Map can be optimized further, but that its benefit over a hashtable implementation is only that it supports faster non-destructive updated/removed/++/-- which they are claiming to be niche operations.

Yes, I mean exactly this.

I think what they're suggesting is that the default immutable.Map implementation should be one that is an immutable wrapper on the hashtable that mutable.HashMap uses.

I believe that:

  1. Current implementation depresses the scenario which seems to be the most common one (create once, never update)
  2. This implementation should be altered to avoid harsh and unneccessary performance penalty OR there should be another method (well, .toLookup/.toIndex/.indexed) specifically for that purpose.
NthPortal commented 3 years ago

if updating it isn't a concern for you, then just do to(mutable.Map) and upcast to collection.Map?

NthPortal commented 3 years ago

also, what about stats for +/+= on Map, which calls updated? also

I've quickly checked several hundred thousands of Scala LoCs and seems like .update and ++= are rare operations.

the name of the method on immutable.Map is actually updated; would your check have caught that as well? what methodology was used for the search? if it's just github, does application code use it more than library code?

pshirshov commented 3 years ago

if updating it isn't a concern for you, then just do to(mutable.Map) and upcast to collection.Map?

toMap is short and convenient, that's what people always type.

mutable map+upcast is an arcane an inconvenient ritual everyone needs to memoize.

pshirshov commented 3 years ago

also, what about stats for +/+= on Map, which calls updated? also

I've quickly checked several hundred thousands of Scala LoCs and seems like .update and ++= are rare operations.

+=/++=/updated. Didn't check for removals though.

the name of the method on immutable.Map is actually updated; would your check have caught that as well?

Yes, I mistyped here.

what methodology was used for the search?

Jetbrains IDEA + find usages for toMap and other methods, rgrep for .to(Map). Plus some manual digging around the code. Nothing smart.

if it's just github, does application code use it more than library code?

Several private codebases.

mkeskells commented 3 years ago

Hi,

Its not the frequency of the code occurrence that matters, it the frequency of execution in the codebase

You also need to look for ‘+’ and ‘++’ in the patterns, as ++= is just syntax sugar for ++ when dealing with immutable collections, as is +/update

Similarly --/--= and the remove

Also you need to consider merge (it’s a shame this isn’t in the Map API, only HashMap)

The main advantage of the trie is structural sharing. This reduces the memory pressure, and enables some of these bulk operation to be performed in better that linear time/memory

Running benchmarks on real usecases on a several million LOC private codebase there operation consume a significant budget of CPU and memory

That’s not to say that a flat hashtable approach isn’t a good alternative, indeed there are several other alternatives that could be explored. All have advantages & drawbacks

Mike

From: Ruben Vergani @.> Sent: 23 May 2021 15:48 To: scala/bug @.> Cc: mkeskells @.>; Team mention @.> Subject: Re: [scala/bug] .toMap seems to be significantly slower than .to(mutable.Map) (#12400)

I've quickly checked several hundred thousands of Scala LoCs and seems like .update and ++= are rare operations.

Same here

— You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub https://github.com/scala/bug/issues/12400#issuecomment-846574849 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AFJ6LZLBVROBHN5CGU2XRZTTPEIT3ANCNFSM45JZPGYA . https://github.com/notifications/beacon/AFJ6LZKTEUW4RIV54NRLY6LTPEIT3A5CNFSM45JZPGYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOGJ23KAI.gif

pshirshov commented 3 years ago

Its not the frequency of the code occurrence that matters, it the frequency of execution in the codebase

Yes, I made rough estimations. Better than nothing.

The main advantage of the trie is structural sharing.

Yes, sure. Though it seems like the defaults aren't that nice. A performace-concerned engineer should memoize an arcane rule ("in case I want immutable updates I have to use .toMap, otherwise I need to write things like .to(mutable.Map):scala.collection.Map") or take the performance penalty (moreover, it's not even mentioned anywhere, even the class name scala.collection.immutable .HashMap is misleading). I think that there should be an easy way to quickly build indexes which are not intended to be updated.

Ichoran commented 3 years ago

@pshirshov - Immutable collections often have a performance penalty of factors of 2 and such. If you care about that, you need to be mindful of the collection. That map updates are relatively rare does not mean that we can absolutely clobber default performance of what is supposedly a supported operation. The libraries should contain only vanishingly few instances of booby-traps like that, where you have an innocent-looking operation that takes unexpectedly forever to run (as opposed to merely 2x slower).

If you want something that may or may not be immutable, use collection.Map. It's already there for this exact purpose.

In one large project I work on, I'm aware of a couple of places that extensively update immutable maps of non-negligible size. Losing 10000x performance on those would vastly outweigh the benefits of gaining 2x performance on all the non-updates. Indeed, the code would just be broken at that point.

So the change cannot happen and should not happen, and the library already has constructs that enable use of mutable maps. It might be nice to add in an external library an immutable.WrappedMap that defers everything to a wrapped map (defaulting to mutable.HashMap) for people who want somewhat faster construction and know that they won't be doing updates.

pshirshov commented 3 years ago

@Ichoran

Immutable collections often have a performance penalty of factors of 2 and such.

Well, in this particular case it's not a factor of two, it is orders of magnitude slower. I have two javascript expression evaluations per one map creation. And map creations account for half of the CPU time (wall).

I've just spent some time profiling my example.

JProfiler in instrumentation mode shows that map creation is accountable for 35% of the CPU time. When I switch to mutable collection JProfiler says that map creation is accountable for less than 1% of the time. So, roughly it seems to be about 30 times slower... Well, I'm aware that profiler numbers are different from wall time numbers, though relative profiling numbers should be more or less correct. And that is in line with my observations. We may write a JMH benchmark in case we need precise numbers, though I doubt there would be any significant difference.

Maybe it's important to note that every map in my test consists of 100 elements, keys are strings (4-6 characters) and values are longs. Numbers may be better in other configurations (less elements, different key types), but my configurations seems to be generic enough.

In one large project I work on, I'm aware of a couple of places that extensively update immutable maps of non-negligible size.

Yes, a couple of places. Current defaults seem to favour uncommon scenario while inflicting enormous performance penalty for more common scenarios. I believe you may easily use mutable structures at the places where you update heavily (and that's intuitive and logical). I may put it in other words - in order to use a map you'll always have to create it, but not every created map will be modified.

So the change cannot happen and should not happen

I disagree here, but even if there are reasons to keep things as is, a new shortcut shortcut, e.g. .indexed, would be completely harmless and really useful.

neko-kai commented 3 years ago

This isn't about mutability per se, only about the cost of building a HashTrie (using .toMap) vs cost building a Hashtable (to(mutable.Map)). There may be a practical reason that we only have a mutable Hashtable in collections library, but it's not about mutability. It's about Hashtable being a largely contiguous chunk of memory that's fast to populate, we now have an immutable ArraySeq for largely the same reasons – sometimes it's faster to copy everything instead of chasing pointers, or sometimes you want to build a collection once – fast – and use it as read-only.

I think the issue is that a HashTrie is built one element at a time – every new element grows insertion cost slowly until it's unbearable:

one such map per two javascript expression evalutations (with graal polyglot). I believe js evaluation should be orders of magnitude slower than map creation but in fact map creations account for HALF of the CPU time.

That is, the entire GraalVM JavaScript interpreter runs twice before a single HashTrie is built from another existing collection.

What I think of as a compromise solution:

By the way, I find the most common operation for maps in my codebase to be variants of .iterator.map(...).toMap – these do not benefit from ++ being cheap, but do benefit from Map construction being cheap, since they rebuild the Map.

Ichoran commented 3 years ago

@pshirshov - Rare disasters are worse than common annoyances, yes.

However, the 30x is very surprising. Profilers often don't do a great job at assigning blame for lowish-level computations. Can you create a runnable example that shows a comparable performance discrepancy? There may be something that can be done in such cases.

In my hands, microbenchmarks show that the immutable version takes 2-4x longer, depending on the map length. When I've refactored real applications to use mutable maps, I've observed speedups roughly consistent with this.

pshirshov commented 3 years ago

Can you create a runnable example that shows a comparable performance discrepancy?

Yes, I may write a reusable repro but that'll take some time.

Profilers often don't do a great job at assigning blame for lowish-level computations.

Yes, I'm aware. Though it's more or less in line with what I observe - TWO javascript evaluations take as much time as ONE map creation.

takes 2-4x longer, depending on the map length.

Each map in my case consists of 100 keys. That may be more than in your benchmarks and probably that's the cause.

Ichoran commented 3 years ago

Each map in my case consists of 100 keys. That may be more than in your benchmarks and probably that's the cause.

No, I ran from two to 10,000.

Just ran 100 and got an even smaller discrepancy (less than 2x). I don't think that is the issue.

pshirshov commented 3 years ago

Are you running your benchmarks on 2.13? I doubt it's important, but I'm using 2.13.5 on GraalVM CE 21.1.0 (J11). GC times aren't significant (typical "saw pattern", 0-10%, G1 GC), so probably that's not relevant.

Ichoran commented 3 years ago

I'm not using GraalVM.

Ichoran commented 3 years ago

Actually, there might be a regression in 2.13. I ran most of it on 2.12, but I explored 2.13 more and I'm seeing more like a 4.5x slowdown. That's worse than I expected.

mkeskells commented 3 years ago

What is the code that causes this. It would be good to see what the actual code path is, as a concrete example

Always easier to optimise that 😊

I would also not trust instrumented code to run for these types of benchmark, but would trust sampling, particularly JFR

Instrumented code favours larger methods, as the entry and exit costs

From: Paul S. @.> Sent: 24 May 2021 14:42 To: scala/bug @.> Cc: mkeskells @.>; Mention @.> Subject: Re: [scala/bug] .toMap seems to be significantly slower than .to(mutable.Map) (#12400)

@Ichoran https://github.com/Ichoran

Immutable collections often have a performance penalty of factors of 2 and such.

Well, in this particular case it's not a factor of two, it is orders of magnitude slower. I have two javascript expression evaluations per one map creation. And map creations account for half of the CPU time (wall). JProfiler in instrumentation mode shows that map creation is accountable for 35% of the CPU time. When I switch to mutable collection JProfiler that map creation is accountable for less than 1% of the time. So, roughly it seems to be about 30 times slower...

In one large project I work on, I'm aware of a couple of places that extensively update immutable maps of non-negligible size.

Yes, a couple of places. Current defaults seems to to favour uncommon scenario while inflicting enormous performance penalty for more common scenarios. I believe you may easily use mutable structures at the places where you update heavily (and that's intuitive and logical).

So the change cannot happen and should not happen

I disagree here, but even if there are reasons to keep things as is, a new shortcut shortcut, e.g. .indexed, would be completely harmless and really useful.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scala/bug/issues/12400#issuecomment-847050908 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AFJ6LZO5T7HXBOUAWQJMZSTTPJJTNANCNFSM45JZPGYA . https://github.com/notifications/beacon/AFJ6LZIGFAHJKOIY2HIABU3TPJJTNA5CNFSM45JZPGYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOGJ6PRHA.gif

mkeskells commented 3 years ago

The main advantage of the trie is structural sharing.

Yes, sure. Though it seems like the defaults aren't that nice. A performace-concerned engineer should memoize an arcane rule ("in case I want immutable updates I have to use .toMap, otherwise I need to write things like .to(mutable.Map)...") or take the performance penalty (moreover, it's not even mentioned anywhere, even the class name scala.collection.immutable .HashMap is misleading). I think that there should be an easy way to quickly build indexes which are not intended to be updated.

The model of the collections is “immutable by default”, so toMap is to immutable map

why do you find name scala.collection.immutable.HashMap misleading. It’s a hash based immutable Map

Ichoran commented 3 years ago

Oh, no, on further investigation it's not a regression of immutable maps, it's a progression of mutable maps...the precise test I was using is faster on 2.13.

This shows about 3.5x slower under 2.12 and 4.5x slower under 2.13:

  val N = 10000
  val M = 100
  val b = Array.tabulate(N){ j =>
    Array.tabulate(M){ i =>
      ((i*M + j).toString, (i*j).toString)
    }
  }
  val m = new Array[collection.Map[String,String]](N)

  var k = 0

  def foo1a(): collection.Map[String, String] = {
    var j = 0
    while (j < b.length) {
      m(j) = b(j).toMap
      j += 1
    }

    m(k)
  }

  def foo1b(): collection.Map[String, String] = {
    var j = 0
    while (j < b.length) {
      m(j) = b(j).to(collection.mutable.Map)
      j += 1
    }

    m(k)
  }

You can't use foo1b form in 2.12 because you can't to a map in 2.12. If you use ++ then 2.12 is 2x slower than 2.13's to(...). Element-by-element is also slower, enough to make the difference between ~3.5x and ~4.5x.

(I randomly change k during the benchmark to ensure that no stored values can be skipped.)

pshirshov commented 3 years ago

What is the code that causes this

Give me some time, I'll bring it here...

I would also not trust instrumented code to run for these types of benchmark, but would trust sampling

As I said before, I'm not instrumenting, I'm sampling. Anyway that's just an additional observation, my original issue is that just population of 40 million of 100-element immutable maps takes 4 minutes on my machine, while it's just seconds in case of mutable maps.

pshirshov commented 3 years ago

The model of the collections is “immutable by default”, so toMap is to immutable map why do you find name scala.collection.immutable.HashMap misleading.

It's very comon for people to expect simple hashtable under the hood when they see "HashMap". Also, a hash-based mutable map is named HashMap and hash-based immutable map with totally different implementation/guarantees is named the same. That's... a bit ambigious.

It’s a hash based immutable Map

Almost all the practically useful maps are hash based (plus, maybe, require some additional contracts to be fulfilled).

Ichoran commented 3 years ago

40 million of 100-element immutable maps

Are you just running out of memory and having to use swap space?

Almost all the practically useful maps are hash based

Yes, which is why it's handy to know that it's a hash-based map. There are also ordering-based maps (SortedMap) and linear maps (ListMap, useless except for VERY short maps).

I don't think it's misnamed. It's not called HashTableMap. Every immutable collection should be presumed to have a different implementation than a mutable collection with similar use-cases. Maps are no exception. (For instance, Vector is very different from the mutable equivalent, which is called ArrayBuffer not Vector in Scala...but C++'s Vector is the mutable one.)

RubenVerg commented 3 years ago

but C++'s Vector is the mutable one

C++'s std::vector or vector, since there's many Vector-named immutble implementations

pshirshov commented 3 years ago

Are you just running out of memory and having to use swap space?

Not at all. Each map is used once then its reference gets released. My memory footprint is constant and very low. The system isn't swapping either (actually I have 128Gb RAM).

Yes, which is why it's handy to know that it's a hash-based map. There are also ordering-based maps (SortedMap) and linear maps (ListMap, useless except for VERY short maps).

I would say that in case you treat "Hash" so broad this knowledge is mostly useless. It just informs user that this collection expects one contract to be fulfilled.

I don't think it's misnamed. It's not called HashTableMap. Every immutable collection should be presumed to have a different implementation

I guess the only way to find the truth here is to make a poll. Anyway, I shared my thoughts, in case you disagree - that's fine.

johnynek commented 3 years ago

You could image an implementation of immutable.Map like:

case class WithUpdate[K, V](fallback: collection.Map[K, V], overwrites: immutable.Map[K, V]) extends immutable.Map[K, V] {
  def update(k: K, v: V) = WithUpdate(fallback, overwrites.update(k, v))
  def get(k: K) = overwrites.get(k).orElse(fallback.get(k))
  ...
}

Especially if you know the size of map you are building (builder has the sizeHint set), you could allocate an Array to hold the hash table for the fallback Map.

But there is a memory tradeoff here and the cost to consult the overwrite map.

mkeskells commented 3 years ago

Yes you could do that But it's a memory leak and a cpu hog when you have lots of updates And it would be worse with removes as well

I don't think it's a good idea for the common implementation but may be useful is niche cases

Get Outlook for Androidhttps://aka.ms/ghei36


From: P. Oscar Boykin @.> Sent: Monday, May 24, 2021 8:23:55 PM To: scala/bug @.> Cc: mkeskells @.>; Mention @.> Subject: Re: [scala/bug] .toMap seems to be significantly (~30 times) slower than .to(mutable.Map) (#12400)

You could image an implementation of immutable.Map like:

case class WithUpdate[K, V](fallback: collection.Map[K, V], overwrites: immutable.Map[K, V]) extends immutable.Map[K, V] { def update(k: K, v: V) = WithUpdate(fallback, overwrites.update(k, v)) def get(k: K) = overwrites.get(k).orElse(fallback.get(k)) ... }

Especially if you know the size of map you are building (builder has the sizeHint set), you could allocate an Array to hold the hash table for the fallback Map.

But there is a memory tradeoff here and the cost to consult the overwrite map.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/scala/bug/issues/12400#issuecomment-847277981, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFJ6LZJIH4VRLWHE3RIZDETTPKRUXANCNFSM45JZPGYA.

NthPortal commented 3 years ago

because you can't to a map in 2.12

what about with collection-compat?

Ichoran commented 3 years ago

@NthPortal - Yes, that ought to work; I just meant out of the box (zero setup). In retrospect, zero setup was probably not the right choice for a cross-version test.

SethTisue commented 3 years ago

This discussion has grown rather long.

The following suggestions were made:

but there seems to be consensus against both changes, and also, if I understand correctly, we don't have definite information that there is actually a 30x performance difference, as opposed to something much more reasonable like 4.5x.

So then the question is, what else came out of this discussion that might be actionable? It sounds like there is some interest in adding something comparable to scala.collection.immutable.ArraySeq.unsafeWrapArray that would represent a mutable map as immutable, even though there is no way to prevent the user from writing unsafe code using it, hence the "unsafe" in the method name.

I suggest we close this ticket because there are too many issues here. Discussion can continue here, but if any concrete changes are to be made, let's open new, more specific tickets about them.

pshirshov commented 3 years ago

I think that a good solution would be to add an extension method (e.g. .indexed) equivalent to .to(mutable.HashMap) + an upcast.

NthPortal commented 3 years ago

It sounds like there is some interest in adding something comparable to scala.collection.immutable.ArraySeq.unsafeWrapArray that would represent a mutable map as immutable, even though there is no way to prevent the user from writing unsafe code using it, hence the "unsafe" in the method name.

That sounds like a decent idea. collection wrappers shouldn't be too hard to make. after seeing Dale's comment, I retract

dwijnand commented 3 years ago

I guess I'll register what I said to Seth while triaging this: I don't like immutable.ArraySeq.unsafeWrapArray because it's making the "immutable" part a lie. "Unmodifiable" is the name I've seen used for data types that aren't modifiable by the user, but could be changed by their creator, leaving "immutable" to imply things can't ever be changed. Now all those neat traits in scala.collection.immutable aren't sealed or anything, so it's not iron-claded or anything (plus sealed is Scala only, plus runtime reflection...) but personally I would prefer not adding another liar.

martijnhoekstra commented 3 years ago

For ArraySeq a wrapper make more sense for varargs and array interop, but those use cases don't apply here. I find it difficult to come up with a use for a wrapper around a mutable map to provide an immutable interface instead of just upcasting to scala.collection.Map or even to Function1.

NthPortal commented 3 years ago

fairly simple solution to avoid ugly looking boilerplate but not actually change anything at all:

scala> def UnmodifiableMap[K, V]: collection.Factory[(K, V), collection.Map[K, V]] = collection.mutable.Map
def UnmodifiableMap[K, V]: scala.collection.Factory[(K, V),scala.collection.Map[K,V]]

scala> (1 to 5) zip (1 to 5) to UnmodifiableMap
val res1: scala.collection.Map[Int,Int] = HashMap(1 -> 1, 2 -> 2, 3 -> 3, 4 -> 4, 5 -> 5)

scala> res1.getClass
val res2: Class[_ <: scala.collection.Map[Int,Int]] = class scala.collection.mutable.HashMap

Edit: an alternative definition of UnmodifiableMap that also works is

scala> def UnmodifiableMap: collection.MapFactory[collection.Map] = collection.mutable.Map
def UnmodifiableMap: scala.collection.MapFactory[scala.collection.Map]
pshirshov commented 3 years ago

Yeah, it would be great to have it in the Predef or collection-compat...

NthPortal commented 3 years ago

personally, I think it's overly specified and not general enough to be in the standard library, and collection-compat is just for compatibility between 2.11/2.12/2.13, not for new features.

as a 1-liner, you can add your own predef and put it there, import it, or even just paste the single line into any file that needs it. there are so many possible variants you could make that I don't think there's a meaningful way for the standard library to include them.

martijnhoekstra commented 3 years ago

There also is collection-contrib where it could graduate from if it proofs popular.

eed3si9n commented 3 years ago

I wonder if we should ask what a MapView is. If it is a wrapper around HashMap (I'm guessing here), maybe we just need to keep MapView as is and not try to .toMap in haste just to keep the code compatibility with Scala 2.12?

johnynek commented 3 years ago

Yeah, I wonder if part of the issue is that we don't have the concept of a read-only Map. If we had a read-only Map type, you could pass that in many cases, and giving a mutable map there is totally safe if you know you "release" it.

Side note: it would be really cool if scala could have "owned" parameters. Then we could safely write:

def toReadOnly[A, B](mmap: owned mutable.Map[A, B]): ReadOnlyMap[A, B] = ....

where the compiler verifies that a single owned reference is passed into the map (owned here means it was allocated locally, and after the call no reference remains).

dwijnand commented 3 years ago

If we had a read-only Map type

Is that not scala.collection.Map?

eed3si9n commented 3 years ago

Upcasting to collection.Map doesn't statically guarantee that all its residents remain referentially transparent:

scala> val m1 = collection.mutable.Map(1 -> "a")
val m1: scala.collection.mutable.Map[Int,String] = HashMap(1 -> a)

scala> val m2 = m1: collection.Map[Int, String]
val m2: scala.collection.Map[Int,String] = HashMap(1 -> a)

scala> m2(1)
val res0: String = a

scala> m1(1) = "b"

scala> m2(1)
val res2: String = b

If there were readonly.Map it would need to be opaque type or sealed trait or something where only the collection library can create it via toReadOnly, and internally call .clone or something.

RubenVerg commented 3 years ago

Wouldn't that be collection.immutable.Map, or am I missing something? A mutable.Map can be up cast to a Map, but not sibling-cast (?) to a immutable.Map

johnynek commented 3 years ago

immutable.Map also has + and -. Requiring supporting those is what makes wrapping a mutable.Map as an immutable potentially costly because a modification will either have to be dealt with by copying or keeping track of which keys have been changed.