Closed scabug closed 1 year ago
Imported From: https://issues.scala-lang.org/browse/SI-4442?orig=1 Reporter: @SethTisue See #3724
@TiarkRompf said: I've been having this on my todo list for quite a while now. The problem is that it requires pervasive changes to the current internal structure and those are hard to get right without slowing down other uses of Vector. Performance will most likely be amortized log n, not worst case. It's good to have a ticket for this, though.
@SethTisue said: Is Phil working on this? (the latest Scala meeting notes seem to say so)
@TiarkRompf said (edited on Mar 2, 2012 9:10:30 PM UTC): Here's a link to the paper: http://infoscience.epfl.ch/record/169879/files/RMTrees.pdf?version=1
Still need to resolve integration with the current Vector implementation. Maybe it's better to have a separate CatenableVector as a first step.
@pchiusano said:
Related to this, (1 to N).foldLeft(Vector[Int]())(_ :+ _)
runs in linear time, but (1 to N).foldLeft(Vector[Int]())((acc,a) => acc ++ List(a))
is quadratic. This is extremely surprising (I spent several hours tracking down a performance bug in scalaz-stream that ended up being caused by this). Since Vector has a constant time snoc operation, it seems like the default implementation of ++ should be repeated calls to :+, which doesn't require any internal changes to Vector.
@SethTisue said: Paul: I'm extremely glad you noticed this. I don't think anyone realized the situation was that bad! I opened a separate ticket on it at #7725.
@adriaanm said: was this fixed along with SI-7725?
@Ichoran said: This was not fixed--this is about preserving structural sharing across the entire tree, while the other was to at least start from the bigger of two collections to get at least some sharing.
@SethTisue said: reassigning to "Backlog" on the grounds that the research on how it would even work hasn't been done yet
@Blaisorblade said: Shouldn't RRB-Vector be a potential fix for this, according to the ICFP'15 paper describing it?
https://github.com/nicolasstucki/scala-rrb-vector https://infoscience.epfl.ch/record/213452/files/rrbvector.pdf http://dl.acm.org/citation.cfm?id=2784739
These are very relevant! paper describing a potentially new data structure for Vectors: https://infoscience.epfl.ch/record/213452/files/rrbvector.pdf implementation: https://github.com/nicolasstucki/scala-rrb-vector
Is this addressed by any chance in the new collections in 2.13? 🙂
@V-Lamp the Vectors in 2.13 will not be the new data structure (RRB Vectors), but their concatenation may be improved enormously by https://github.com/scala/scala/pull/7588 which ensures that during concatenation of vectors, the structure of the left Vector is used rather than copying it into a new Vector. This essentially means that Vector concatenation will be constant-time on the size of the left vector (but still O(n) where n = size of right vector).
https://github.com/scala/scala/pull/7588 was reverted due to https://github.com/scala/bug/issues/11636 but the new Vector implementation that I am working on for 2.14 also has this optimization (appending n elements to a Vector of size m is O(n + log(m+n))). I don't know if I'll find the time to implement it but the same is possible for prepending. The combination of the two would allow concatenation depending on the size of the shorter slice.
Note that Vector
was completely redone for 2.13.2: https://github.com/scala/scala/pull/8534
Has anyone studied what the current situation is w/r/t to this ticket?
I see that on https://github.com/scala/scala/pull/8534 @szeiger wrote,
I did not get around to implementing an optimized concatenation where the right-hand side vector determines the alignment. Like most other algorithms for this data structure it is quite easy to visualize how it works and assure yourself that it is possible but actually implementing it correctly is much harder. This would allow concatenation of two vectors of sizes m and n in O(min(n, m) + log(n+m)) time.
So that sounds like we're still where we before: doing concatenation as efficiently as possible remains future work.
For the record: I've implemented concatenation in O(min(n, m) + log(n+m)) time. (If you're really lucky and the Vector structures are perfectly aligned, it's even only O(log(n+m)).) It's not yet PR-ready, but I hope I'll be able to polish it in a month or so when I'm back from vacation.
currently scala.collection.immutable.Vector.++ simply inherits its implementation from a supertrait (through the stubs added in r24227), so if you concatenate two vectors, it doesn't take advantage that that both are the same structure, but just treats the second vector like any traversable.
in theory this could be O(log n), not O(n), yes? Tiark seems to confirm it in this comment on #3724: "Unfortunately, for implementing a fast concat operation (we hope to do that for 2.8.1 or 2.9) heterogenous Arrays seem to be necessary (we'll be storing Int arrays next to the sub nodes). We might rethink this however, and try to stick to homogeneous Arrays."
I just thought there should be enhancement ticket on this as it would still be nice to have someday.