scala / collection-strawman

Implementation of the new Scala 2.13 Collections
Apache License 2.0
200 stars 72 forks source link

Views #13

Closed szeiger closed 6 years ago

szeiger commented 7 years ago

Let's use this ticket for discussions about views. Some comments were already made in https://github.com/scala/collection-strawman/pull/7:

My own 2ct on the topic so far: I never cared for mutable views (too confusing) but I like the ideas of views as reified iterator operations, as currently implemented here. These could be useful in the context of both, mutable and immutable collections. I would like to propose to take them one step further and make the evaluation order of view operations unspecified. If views are used as a lightweight abstraction for composing complex transformations, you don't generally care about side effects, but you always care about performance (otherwise you'd transform the underlying collections in multiple steps). Views could be a place where we allow operations to be fused for better performance.

julienrf commented 7 years ago

For reference, here is an example given by @Ichoran on gitter about fusion:

Just did a quick test and

def jmap(i: Iterator[Int], f: Int => Int, g: Int => Int) = i.map(f andThen g)

takes only 60% as long to run as does

def imap(i: Iterator[Int], f: Int => Int, g: Int => Int) = i.map(f).map(g)

because boxing is a lot of the overhead, and functions are specialized.

DarkDimius commented 7 years ago

@julienrf, it's not only because of boxing, it's also because of reduction total memory footprint by 2x. If you try functions on non-primitives(which eliminates difference with boxing), you'll still see a speedup.

julienrf commented 7 years ago

Still quoting @Ichoran:

If I use objects and just switch references around to which object I have, the jmap variant takes 75% as long as imap, so still a nice little win. But using List instead of iterator slows things down by 5x, so you're right that avoiding intermediate collections can be the big win.

Ichoran commented 7 years ago

Mutable views are nice to have for a variety of divide-and-conquer in-place algorithms. For instance, an in-place sort is easier to express if you don't have to worry about indices.

I don't think that dropping the order of evaluation is particularly useful for either mutable or immutable views. That is a different useful thing, but "I want to act on part of my collection without rebuilding it" is a different desire than "I want to parallelize my operations on this collection".

If we were to support only one of the two, we at least shouldn't call it a "view" because right now (and in most other languages I've seen) "view" means the former.

SethTisue commented 7 years ago

Mutable views are nice to have for a variety of divide-and-conquer in-place algorithms. For instance, an in-place sort is easier to express if you don't have to worry about indices

I think this is a niche, 0.1% use case and we should drop it and not worry about it.

I don't think that dropping the order of evaluation is particularly useful for either mutable or immutable views. That is a different useful thing, but "I want to act on part of my collection without rebuilding it" is a different desire than "I want to parallelize my operations on this collection".

Not sure, but I think you might be misunderstanding Stefan's suggestion? It isn't about parallelization. He isn't suggesting leaving iteration order unspecified — allowing element 4 to be operated on before element 3, as can happen with parallel collections. Rather, he's suggesting to allow fusion. If you do .map(a).filter(b), it would be unspecified whether all the elements get mapped before any of them get filtered.

szeiger commented 7 years ago

Also note that evaluation order of views in the current strawman is already different from evaluation order of operations on the underlying collections. If you wanted the same evaluation order you would have to build intermediate results but the whole point of views is to avoid that. I am only suggesting that we do not guarantee the same evaluation order as that of Iterators, either.

Ichoran commented 7 years ago

Ah, yes, I did misunderstand. I agree that views should be maximally flexible in that regard; unlike iterators which should promise to be as lazy as possible, views can do whatever is "best" with regards to evaluation of sequential operations.

And I don't think evaluation order should be the same save on Seqs and SortedWhatevers where there is already a natural evaluation order.

But I think the "niche, 0.1% use case" evaluation is only because we're making it so by providing no support for this way of working. Views are a key aspect of, for instance, image processing operations in ImageJ; the conceptual equivalent of views with indexed ranges are used all over the place both in Java (e.g. java.util.Arrays.sort(xs, i0, iN)) and elsewhere (e.g. it's exceedingly common in Matlab).

One of the weaknesses of the current collections library is that it has relatively poor support for mutating operations. We needn't necessarily improve this greatly (people are used to it not being there), but it doesn't mean that the potential use cases are rare.

For instance, if one has a list of files and one wants to replace all the missing ones by default file names, you would normally do this with immutable collections:

xs.map(f => if (f.exists) f else default)

But although you could reassign everything in place mutably:

xs.mapInPlace(f => if (f.exists) f else default)

in some ways it's more natural to define a filtered view and fill it:

xs.viewWhere(! _.exists).fill(default)

Again, I'm not saying it's really important for us to provide this kind of functionality, but I don't think we should view this as for niche applications even if we have immutable alternatives as the standard now. (Except inasmuch as anything mutable is a niche application.)

SethTisue commented 7 years ago

I think the "niche, 0.1% use case" evaluation is only because we're making it so by providing no support for this way of working

It isn't so much that I'm opposed to including support for it, the only thing I'm strongly opposed to is letting this class of use cases be something that drives top-level design decisions. If it can be accommodated without distorting the hierarchy+API, great. (Can it?)

Ichoran commented 7 years ago

I think that mutable views would just be a completely different thing than immutable views, so I don't think they'd impact the hierarchy. We could call them something else also.

odersky commented 7 years ago

I would like to propose to take them one step further and make the evaluation order of view operations unspecified.

I don't think that's necessary. The specified evaluation order of views already demands fusing. An unspecified evaluation order could give you some extra facility, for instance if you want to batch some ops into segments before fusing. But I don't think views are the right place for that.

[EDIT] I might be convinced otherwise by benchmarks that show that there's a big win to be had by automatic batching.

I am also betting that we will have much better techniques to reason about effects in the future, which would let us optimize without compromising semantics.

odersky commented 7 years ago

Regarding mutable views, one question is what operations would produce a view? If we do not want to re-index (and I think we should not), then it seems the only sensical operations are to change the start, the end, or the stride. So, instead of views we could use a general notion of slices. E.g., on mutable (indexed?) Seq[A]:

def slice(r: Range): Slice[A]

val xs: Array[Int] = ...
xs.slice(3 until xs.length)
xs.slice(5 to 1 by -2)

Normal, strict slices are collection transformers and would not be available on mutable collections. There's no danger of silently changing semantics because the new non-strict slice takes a range as argument instead of two bounds.

WDYT?

odersky commented 7 years ago

I just noted that there is a source of confusion for arrays.

xs.slice(1, 3)   

creates a new, strict array, or at least it did so far. But xs.slice(1 until 3) creates a view. Not sure what to do about that one yet - deprecate old-style slice on Array maybe?

julienrf commented 7 years ago

@odersky Maybe we should suffix operations producing a view with “view” (e.g. xs.sliceView(1 to 5)), to make it clearer that further operations will be fused.

odersky commented 7 years ago

@julienrf I agree sliceView would be clearer but it would also be longer and harder to remember. Also, it should not be ...View since a slice is not a view, but supports different operations. Maybe some other name? But I am not sure which.

odersky commented 7 years ago

I did a web search and it seems except for Scala, the name slice implies resource sharing. So, here's another possible naming strategy:

The result of slice would be collection.Slice and collection.mutable.Slice respectively. These types support some subset of all collection operations (essentially those operations that maintain the element type and do not require re-indexing).

Ichoran commented 7 years ago

I am wary of having too many different derived collection types. So far we have immutable, mutable, iterator, immutable view, mutable view, and now we're going to have slice also? One of the primary reasons that the existing views are not used much--aside from surprising behavior that arose from the lack of a clear distinction between mutable and immutable views and inherent bugs due to LSP issues--was that they were buggy because it was hard to maintain so many different parallel implementations.

I would therefore try quite hard to keep slices and views the same thing. A slice would be one particular way to get a view. If we're not going to do that I would like to have examples of methods that are very important to have on one but must not be on the other. I can't think of any offhand.

szeiger commented 7 years ago

It's not clear to me why we would need separate types for slices and views. If we distinguish between mutable and immutable views, slice on a mutable collection could produce a mutable view and subsequence on an immutable collection would create an immutable view. Array.slice is a problem in this model because the operation we want on (mutable) arrays is slice (with the new meaning), so we'd have to reuse the name with a new meaning (which IMHO makes this a non-starter; a change like this is far too dangerous).

Are there any other opinions on the method names? If they are used in this way in other languages and libraries it probably makes sense to do the same (modulo the Array.slice problem) but I find them counter-intuitive. I expect slice to actually slice off the part that I want without referencing the whole collection, which means copying in the case of Array.slice. But maybe that's just because I've mostly worked with Scala for the last decade.

odd commented 7 years ago

How about def focus(r: Range): View[A]?

julienrf commented 7 years ago

I would like to take advantage of this thread to ask whether we should keep View extend Iterable or not. The argument to not make View extend Iterable is that by doing so we have a clear separation between collections types whose transformations (filter, take, map, flatMap, etc.) are evaluated lazily (View) or eagerly (Iterable). Currently, taking an Iterable as parameter gives us no clue about the way transformation operations will be evaluated.

szeiger commented 7 years ago

The same is true for other lazy collections like Stream or LazyList, and these do not only extend Iterable but even Seq and LinearSeq. Maybe we should make the distinction between strict and non-strict collection types explicit with a marker trait instead of trying to assign meaning to the classes in the main hierarchy.

odersky commented 7 years ago

I agree with Stefan, and I am also not sure whether we should have a marker trait or not. I would defer all that until we have dealt with parallel collections. The order of evaluation (strict/lazy/batched/parallel) can vary a lot and we should think carefully before we invest types to reflect that.

julienrf commented 7 years ago

Yeah, ok, I agree with your arguments :)

LPTK commented 7 years ago

Re: choosing names for in-place mutable operations (from the other thread #7).

@odersky

The one thing I think is important is that we should not try to come up with creative new names for linear versions of immutable operations. My current favorite would be mapInPlace, filterInPlace, but if someone has a better idea, I am happy to change my preference.

What about appending _! to the names of immutable versions? It does seem a bit alien at first, but after having used that in my code for some time, I find it appropriate: it makes it exceedingly clear what are the operations to "watch out for" because of their effects, similar to how ??? catches the reader's eye. For example:

val arr = ArrayBuffer(1,2,3,4,5)
// clearly, I'm doing imperative transformations here:
arr.map_!(x => x + 1).filter_!( _ % 2 == 0 )
// or:
arr map_! (_ + 1) filter_! (_ % 2 == 0)

There is precedence for this in Ruby, where it's somewhat nicer because there is no need for the _. It would be nice if the lexer was modified and we could have:

arr.map!(_ + 1).filter!(_ % 2 == 0)

I would probably also push for marking methods not guaranteed to have the expected efficient implementation (such as size on List not being O(1)) with a _$ as in ls.size_$, marking option-returning methods with _? as in ls.head_?, and marking parallel operations with _|| as in val ys = xs map_|| f, but that would probably be going to far 😄

Atry commented 7 years ago

SeqView is very dangerous at the moment because it may violate rules for java.lang.Object.equals.

https://issues.scala-lang.org/browse/SI-10201

Atry commented 7 years ago

view == view should not force the view.

I expect view0 eq view1 and view0 == view1 are always same.

Atry commented 7 years ago

seq != view should always be true, given we always expect seq != iterator, and view is simply a wrapper of Iterator.

szeiger commented 7 years ago

I agree. In the strawman we define equality at the level of Seq and Set. View is yet another basic collection type which would never be equal to a Seq or Set. Reflexivity is a special case, it should be trivial to fix in either 2.12 or 2.13.

SolomonSun2010 commented 7 years ago

I really expect LazyList or Stream add these powerful and convenient methods : repeat/cycle, repeatedly/generate, iterate, unfold, resource, reject, like in Elixir and Clojure: https://hexdocs.pm/elixir/Stream.html#unfold/2 This symbol #:: is too low,strange to verbose,unreadable. Also, I hope add some xxxIndexed like in Kotlin Sequence: https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.sequences/index.html I think, one of the programming trend is the Lazy Sequences . ps: post here may right.

szeiger commented 7 years ago

My experiences with immutable.Seq/Iterable vs collection.Seq/Iterable vs Iterator from scala-library and scala-reflect so far:

julienrf commented 7 years ago

Many Iterator methods are commonly used but not defined in the strawman.

FWIW I added some of them in that PR: https://github.com/scala/collection-strawman/pull/200/files#diff-c0d7086690fd3b617f2eb319d5b832ce

Consing for collection.Seq is sorely missing.

In case you missed it: #205 (and #206).

szeiger commented 6 years ago

It looks like we've settled on a View design by now so I'm closing this ticket.