Open lihaoyi opened 7 years ago
It would be great if we could use a more efficient and intuitive desugaring. But there's the problem with conditions. How would you translate the following:
for {
x <- List(1, 2, 3)
y = x * x
if y * x > 4
} yield y + x
I don't think there's a way to do it that does not essentially fall back to the scheme we have. We could think of using different schemes for for-comprehensions with val defs before tests and for-comprehensions without. But then the whole thing would get more complex, not less so.
Alternatively, we could outlaw val bindings in front of tests. Then we could use the simple desugaring (or so I hope, I have not checked all cases!). But some Scala programs would no longer compile. It's a value judgement whether this is worth it. I would be interested in people's opinions on this.
Here's one idea for solving the "val assignment before if" problem. Not sure if it's too crazy, but what if your example
for {
x <- List(1, 2, 3)
y = x * x
if y * x > 4
} yield y + x
Desugared into
{
val tmp = List(1, 2, 3)
tmp.flatMap{ x =>
val y = x * x
tmp.point(()).filter(y * x > 4).map{ _ =>
y + x
}
}
}
Where in addition to needing .withFilter
on a type to make if
-guards work, we would ask for a monadic .point
. List().point(x)
would be List(x)
, and similar things could be defined for Option
, Future
, etc.. like Scalaz/Cats already do
After .point
and .filter
are called, the "normal" .map
or .flatMap
takes over from there.
The choice of methods is a bit arbitrary, there are other pairs of methods that could work, e.g. .pointUnit: M[Unit]
, rather than a more generic .point: M[T]
or .point
and .empty
rather than .point
and .filter
. We could also make it implicit based-on-type (e.g. with some sort of def point[T](t: T)(implicit point: Point[T]) = point
and point(tmp)
) rather than as a method, but all that is orthogonal to the core idea.
I think this should work for all arrangements of val
assignments, if
-guards and <-
bindings, since in this scheme if
-guards:
if foo
basically desugar into a
_ <- tmp.point(()).filter(foo)
binding on tmp
object that was last <-
ed.
There are probably other subtleties with this (performance???) that I haven't thought through, and of course it's a much bigger change than the initial post describes. But the fundamental idea seems like it might work, and even with the unusualness of .point
it would still be a simpler desugaring than the status quo.
EDIT: on further thought, .point
feels a lot less sketchy than .forUnit
, so restructured the comment around that
@lihaoyi This is pretty close to the way guard
works in Haskell: https://en.wikibooks.org/wiki/Haskell/Alternative_and_MonadPlus#guard
It would indeed be interesting to look into how we can use something like guard
for an alternative desugaring.
Thanks for the pointer @smarter! If we follow Haskell's guard
, and thus choose to go with point
and empty
rather than point
and filter
, the desugaring ends up being quite nice:
for {
x <- List(1, 2, 3)
y = x * x
if y * x > 4
} yield y + x
{
val tmp = List(1, 2, 3)
tmp.flatMap{ x =>
val y = x * x
if (!(y * x > 4)) tmp.empty
else tmp.point(()).map(_ => y + x) // or flatMap if there's more stuff
}
}
The tmp.point(()).map
is a bit redundant: it could be just tmp.point(...)
in this case:
{
val tmp = List(1, 2, 3)
tmp.flatMap{ x =>
val y = x * x
if (!(y * x > 4)) tmp.empty
else tmp.point(y + x)
}
}
and if there are subsequent generators we wouldn't need it at all:
for {
x <- List(1, 2, 3)
y = x * x
if y * x > 4
z <- 0 until y
} yield y + x + z
{
val tmp = List(1, 2, 3)
tmp.flatMap{ x =>
val y = x * x
if (!(y * x > 4)) tmp.empty
else (0 until y).map(z => y + x + z)
}
}
I think this beautifully captures the symmetry between the if
-guard within the for-comprehension and the equivalent if
-statement in the desugared flatMap
/map
code.
Alternately, we could go with flatMap
and always include a point
at the end, and not use map
at all. I think that's what Haskell basically does?
I guess there's some value in making sure both point
and map
always end up being called, even if just for the not-changing-too-many-things-at-once value? If so we could stick with tmp.point(()).map
. Not really sure what the tradeoffs here are
Anyway, I think any of these variants, if we could make it work, would be superior to the status quo desugaring
It would be great if this could work, but there's a problem.
for {
x <- List(1, 2, 3)
y = x * x
if y * x > 4
z <- 0 until y
} yield y + x + z
{
val tmp = List(1, 2, 3)
tmp.flatMap{ x =>
val y = x * x
if (!(y * x > 4)) tmp.empty
else (0 until y).map(z => y + x + z)
}
}
What is the type of the if-else? It's the lub of the type of tmp.empty
, which is List[Int]
and the type of the mapped range, which is IndexedSeq[Int]
. So the result type is Seq
here which is fine. But in general the type of the argument to flatMap
can be completely unrelated to the type of the original generator. Our translations work regardless of types, so we cannot assume that the generated if-else would be typable. Logically we'd have to pick the empty on the argument to flatmap
argument instead of the empty of the original generator. But I don't see a general way to get that.
EDIT: Maybe it's not so bad. We could simply require that empty
is of a type which is compatible with the argument type of the flatMap. Is there a use for-expressions where this would not be the case?
So if we accept that, the next problem is how to retrofit all collections and other types on which for-expressions with ifs are performed so that they have an empty
. That's still a large problem.
What is the empty value for Either[A, B]? It seems it must be Left[A] but I don’t know where I could get an A value from, unless there was the ability to put an implicit left in scope that would be used by the for comprehension desugaring.
Either[A, B]
does not support .filter
even today, so we probably don't need to worry about it not supporting .filter
due to lack of empty values or whatever
@ Right(1).filter(_ => false)
cmd0.sc:1: value filter is not a member of scala.util.Right[Nothing,Int]
val res0 = Right(1).filter(_ => false)
^
Compilation Failed
It’s true, but it does confuse the heck out of everyone the first time they hit it so if we were able to come up with a solution at the same time it would be useful. The only idea I have so far is as mentioned above, allow an implicit empty to be found in scope which would allow the use of filter as long as the user is able to supply an empty in scope, which for me would have been possible every time I have run into the issue.
I think the correct answer to the confusion is to get people used to the fact that you can't always filter
on a monad. There is a huge number of monads for which filter
doesn't make sense: State
, Reader
, Writer
, Function0
, etc.. In a way I feel that Scala's "put .filter
on anything" convention is itself a bit of a wart
You are winning me over, I agree, the confusion has always arisen from the expectation that it would just work and eliminating that expectation is probably a better way to go.
Actually ran into a big wart recently related to this. See @frosforever's SO post here: https://stackoverflow.com/questions/45333889/regex-unapply-in-for-comprehension-with-if-guard-not-compiling
tl;dr -
This does not compile:
for {
s <- List.empty[String]
regex <- List.empty[scala.util.matching.Regex]
regex(ss) = s
if ss == "foo"
} yield s
But removing the if:
for {
s <- List.empty[String]
regex <- List.empty[scala.util.matching.Regex]
regex(ss) = s
} yield s
or rearranging the order of the two lists in the for comprehension:
for {
regex <- List.empty[scala.util.matching.Regex]
s <- List.empty[String]
regex(ss) = s
if ss == "foo"
} yield s
makes it compile
It would be nice if we could improve on this. But we'd need someone to put in the effort.
Can we at least do something about imperative for
comprehensions? They have no justification for generating such horrible code.
The following:
for { x <- xs; y = x; if y > 0; z <- zs } println(x + z)
should generate:
xs.foreach { x =>
val y = x
if (y > 0) zs.foreach { z =>
println(x + z)
}
}
instead of the current convoluted and inefficient:
xs.map { x =>
val y = x
(x, y)
}.withFilter {
case (x,y) => y > 0
}.foreach {
case (x,y) => zs.foreach { z => println(x + z) }
}
which creates an intermediate list, allocates tuples, and calls four closure-consuming methods instead of just 2.
No wonder people are surprised how much more efficient it is to use while
loops rather than for
loops in Scala 😅
I believe the fix for this one is a low-hanging fruit. I could try to do it at some point if people agree with the idea.
As for functional loops, I think it could be solved in a less artificial way by using flatten
and a simple method called flattenOptions
, of type M[Option[A]] => M[A]
.
This code:
for { x <- xs; t = x+1; if t > 0; y <- ys } yield x + y
would produce:
xs.map { x =>
val t = x+1
if (t > 0) Some(ys.map { y =>
x + y
})
else None
}.flattenOptions.flatten
and this code:
for { x <- xs; y <- ys; t = y+1; if x > t } yield x + y
would produce:
xs.flatMap { x =>
ys.map { y =>
val t = y+1
if (x > t) Some(x + y)
else None
}.flattenOptions
}
Compared to point
and empty
, which have no business whatsoever being defined on monad/collection values, flattenOptions
is an independently useful method. In the case of Scala standard collections, it can actually be the same as flatten
(which only requires the element type to be traversable, and so is Option
).
In addition, flattenOptions
can somewhat easily be defined as an extension method:
implicit class FlattenOptions[A,B](xs: FilterMonadic[Option[A],B]) {
def flattenOptions[That](implicit cbf: CanBuildFrom[B,A,That]) =
xs.withFilter(_.isDefined).map(_.get)
}
Also, not sure if it would fit in the complexity budget because it makes things less regular, but I would still prefer something easier like:
for { x <- xs; y <- ys if x > 0 } yield x + y
to desugar to the more natural:
xs.flatMap { x =>
ys.filter(y => x > 0).map { y =>
x + y
}
}
which does not allocate any options, instead of:
xs.flatMap { x =>
ys.map { y =>
if (x > 0) Some(x + y)
else None
}.flattenOptions
}
Hi all.
I've been thinking about this issue for a while and it looks like the main issue stems from how assignment and condition clauses are desugared in the for
-comprehensions. With that assumption, I started considering a plausible unification, so to speak, of the different clauses.
Let be given the following for
-comprehension:
for {
x <- List(1, 2, 3)
z = 1
if x > z
y <- List(10, 20, 30)
} yield x + y
Let's assume the existence of such a unary constructor single(a: A): F[A]
that we can rewrite z = 1
as z <- single(1)
. (For instance, for the List
type, single(a: A): List[A] = List(a)
.)
Analogically, let's assume the existence of such a unary constructor cond(b: Boolean): F[Unit]
that we can rewrite if x > z
as _ <- cond(x > z)
. (For instance, for the List
type, cond(b: Boolean): List[Unit] = List(()).filter(_ => b)
.)
Then we can write the above for
-comprehension as follows:
for {
x <- List(1, 2, 3)
z <- single(1)
_ <- cond(x > z)
y <- List(10, 20, 30)
} yield x + y
Or, after replacing the constructors with their implementation:
for {
x <- List(1, 2, 3)
z <- List(1)
_ <- List(()).filter(_ => x > z)
y <- List(10, 20, 30)
} yield x + y
It might look that, since we only added boilerplate, the resulting desugared code would be longer and more complex. However, if my IDE and Lihaoyi's Ammonite shell are to be believed, the opposite happens. (In fact, the desugared version of the original, generated by IntelliJ IDEA with the Scala plugin, doesn't even compile!)
I've tried to write a minimalistic type class system in 2.12 that would encompass implicit resolution of the single
and cond
constructors, but encountered either type mismatch errors or ambiguous implicit values errors whenever I tried to implement the constructors for more than one type.
(Here's a small semi-working example using one type — GenTraversable
.)
Regardless of the need of providing simple
and cond
for every type that might need to be used in a for
-comprehension, there are some other drawbacks to this concept:
the source code needs to be pre-processed even before applying the desugaring stage (perhaps a bit of this logic could be refined and integrated into the desugaring proper?);
the source after desugaring might need to be post-processed in order to simplify some specific cases (e.g. single(x).flatMap(f)
→ f(x)
, single(x).map(f)
→ single(f(x))
etc; as above, maybe it could be turned into a part of a larger piece of logic).
I would be happy to receive feedback on whether there is any merit to this idea.
What is the downside to just desugaring into what this does: https://github.com/oleg-py/better-monadic-for? This is the least surprising behaviour (it is what an user expects in her mental model) and does not break current code (even if it does you will get a compile error)
This is the least surprising behaviour (it is what an user expects in her mental model)
"Desugar bindings as vals instead of tuples" might be fine, but desugaring without withFilter
isn't what I expect as an user.
and does not break current code (even if it does you will get a compile error)
Which compile error? If some members fail the match you get a runtime failure instead of skipping the members — nothing in the README makes me expect errors here:
object TestExtract { def unapply(x: Any): Boolean = false }
1 match { case TestExtract() => 1; case _ => 0 }
for (TestExtract() <- List(1, 2, 3)) yield ()
The only error I anticipate is for extractors that affect the expected type, unlike the above one. Tho the README doesn't present any such example.
(_: List[Int]).map( { case (x, y) => x + y })
EDIT: just confirmed that code gives a runtime failure.
// without plugin
scala> for (TestExtract() <- List(1, 2, 3)) yield ()
res1: List[Unit] = List()
// with plugin
scala> for (TestExtract() <- List(1, 2, 3)) yield ()
scala.MatchError: 1 (of class java.lang.Integer)
at .$anonfun$res2$1(<console>:13)
at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:12)
at scala.collection.immutable.List.map(List.scala:283)
... 36 elided
Of course that is not meant as a realistic example, but using an extractor with type unapply(x: T)
on a List[T]
will not give compiler errors, whether the extractor is going to always succeed, or not. What you'd want is to only omit withFilter
for irrefutable patterns, that is (maybe) those built out of the following elementary irrefutable patterns:
Some[...]
or true.type
Alternatively, we could outlaw val bindings in front of tests. Then we could use the simple desugaring (or so I hope, I have not checked all cases!). But some Scala programs would no longer compile. It's a value judgement whether this is worth it. I would be interested in people's opinions on this.
Here's my opinion: I like this idea :)
I think scanning through some big Scala codebase (Scala community build?) would reveal how much code is going to be affected. Maybe the affected code wouldn't be hard to adjust?
What is the current status of this issue?
For the record, I want to mention one more wart, caused by current desugaring of value definitions
in for loop
when used in very imperative/mutable style:
val xs = Array(1, 0, 0)
for {
i <- 1 until xs.length
previous = xs(i - 1)
} xs(i) = previous
Expected (before one looks at the language spec) result: 1 1 1
.
Actual result (because of intermediate tupling): 1 1 0
.
What is the current status of this issue?
The current status is: no one has come up with a full proposal here, no one at EPFL is working on this so this will have to come from the community, see also https://contributors.scala-lang.org/t/pre-sip-improve-for-comprehensions-functionality/3509/21?u=smarter.
My two cents on this…
I think there are a number of historical problems here which have compounded to create the situation we're in today. I'm not entirely sure how to unravel it, but in the interests of clarity of discussion, it's worth stating them as I see them.
Rather than the desugaring of for
being in terms of some typeclass like Monad
and MonadPlus
(or similar), the desugaring has always been a syntax sugar. That sugar is so unconstrained that whole libraries have been built around the idea of using for
as a DSL for entirely non-monadic things (like SQL statements), and that ecosystem cannot be simply dismissed.
All of this means that, while it's theoretically possible to leverage enrichments and typeclasses to push for
desugaring into something a bit more type-driven (similar to what Cats does with its syntax.flatMap._
), it's always going to be a bit second-class.
I don't see a realistic way to correct this issue without breaking vast swathes of the ecosystem, which is to say, I think we're stuck with it.
Due to some of the vagaries of how for
-comprehensions have always been treated, many users now expect behavior that is, when looked at in isolation, actually rather strange. @Blaisorblade's example from above is a good one to look at:
for (TestExtract() <- List(1, 2, 3)) yield ()
Why would you expect this to produce an empty list?
val TestExtract() = List(1, 2, 3)
That produces a MatchError
. So would using match
directly. So why are for
-comprehensions magical in this way? The only reason I can think of is to unnecessarily lean into the syntactic legacy of for
as a construct for iterating over collections. But if you think of for
as a mechanism for iterating over collections, then libraries such as the aforementioned DSL tricks (leveraging the untyped nature of the desugaring) make absolutely no sense, and more semantically-rigorous constructs such as monads look even stranger.
What I'm saying is that this kind of user expectation paints us into a very inconsistent semantic corner. At the risk of sounding dismissive, the expectation is wrong and does not align with the rest of the language.
And here's the other problem: many users have some wildly problematic expectations on how side effects are handled within the desguaring:
for {
x <- List(1, 2, 3)
y = x * x
if y * x > 4
z <- 0 until y
} yield y + x + z
There's no particular reason that this cannot be desugared into the following:
List(1, 2, 3).filter(x => (x * x) * x > 4) flatMap { x =>
val y = x * x
0.until(y).map(y + x + _)
}
The user certainly cannot see a difference except when using a stop watch. The problem is that this cannot be thus desugared without breaking expectations:
for {
x <- List(1, 2, 3)
y = {
println("derp")
x * x
}
if y * x > 4
z <- 0 until y
} yield y + x + z
Users expect that the println
is only going to run once per x
, which is of course not the case when you have the if
in this context. This is kind of a crazy expectation, but one which is pushed rather heavily by non-yield
ing for
-comprehensions, so I don't really blame people for believing it should work in this way. It is, however, still an expectation which creates significant friction when attempting to improve things.
We could also feasibly do something like this:
List(1, 2, 3).map(x => (x, x * x)).filter(_._2 > 4) flatMap {
case (x, y) => 0.until(y).map(y + x + _)
}
This works (even with side effects)! But it also assumes the container is fully polymorphic and imposes extra boxing (as well as other, more deeply subtle and now mostly obsolete issues, such as confusing CanBuildFrom
when filtering a String
). And here again we run into the issues with untyped things, because we cannot simply assume that the F[_]
in question forms a lawful monad (or even functor), and thus we cannot assume that it's safe to just shovel tuples around.
This is a real problem that's going to be a very serious issue for a meaningful chunk of the ecosystem when migrating to Scala 3. For example: https://github.com/typelevel/cats-effect/issues/749 In case it isn't clear, the problem there is being caused by the unnecessary trailing map
generated by the for
, which effectively creates an unexpected memory "leak" due to the stack of continuations which must be maintained. Basically the only solution that I can think of at present is a scalafix rule which removes for
-comprehensions from most code which touches these kinds of libraries, but that's… very frustrating and not a great experience. It effectively takes a major language feature (for
) entirely off the table.
So really, I don't want this to be underestimated: this is going to cause a lot of friction in the Scala 3 migration if not addressed in some way.
Eliminating trailing identity map
s is an easy fix:
for (x <- List(1, 2, 3)) yield x
…should desugar into simply List(1, 2, 3)
. This would resolve the majority of the migration friction.
Silent filtering instead of a more uniform match
semantic on falsifiable patterns should be removed. In other words, @Blaisorblade's example should indeed produce a MatchError
, consistent with val
, match
, and absolutely everything else in the language. I can think of dozens of times over the past decade that I've had bugs that were caused by this feature, including at least three separate times where it required meaningful amounts of time to track down exactly what was going on (most recently, about 18 months ago on Scala 2.12). I have never even once wanted this feature. I realize I'm not a representative sample of the entire Scala ecosystem, but if someone has never seen merit to a language feature over the course of 10+ years of experience and has lost man-weeks of cumulative time to tracking down bugs it has caused, it seems like the feature's continued existence past a breaking migration is probably at least somewhat dubious.
We should bite the bullet and make the assumption that tuples can be stuffed into whatever we're mapping over. Thus:
for {
x <- List(1, 2, 3)
y = x * x
if y * x > 4
z <- 0 until y
} yield y + x + z
// should desugar into...
List(1, 2, 3).map(x => (x, x * x)).filter(_._2 > 4) flatMap {
case (x, y) => 0.until(y).map(y + x + _)
}
This is no more problematic than the current assumptions around withFilter
, and considerably easier to explain. It imposes no meaningful practical performance penalty (the added boxing of the Tuple2
is entirely dwarfed by the raw collection allocation overhead, and if someone really cares about eliminating this, they can do so by making their map
a macro).
Taken together, I believe these last two changes entirely remove the need for withFilter
.
We still are left with a for
-comprehension desugaring which has some unfortunate backwaters and inconsistencies, but these changes will, at the least, eliminate the majority of the current foot-guns while considerably easing the migration from Scala 2.
Needless to say, we could keep the old desugaring available under the -language:Scala2
flag.
Can new syntax or keyword be added to yield monad, not value? e.g.
for {
a <- readHello
b <- readWorld
} yieldF say(a + b)
// desugars to
readHello.flatMap { a =>
readWorld.flatMap { b =>
say(a + b)
}
}
Silent filtering instead of a more uniform match semantic on falsifiable patterns should be removed.
I think -strict
in Dotty (which contains things slated to become default after 3.0) has the behavior you want: http://dotty.epfl.ch/docs/reference/changed-features/pattern-bindings.html#pattern-bindings-in-for-expressions
We should bite the bullet and make the assumption that tuples can be stuffed into whatever we're mapping over.
I like it, it'd be interesting to tweak scalac to do this and see what breaks in the community build.
@scf37 we'd like to fix the existing stuff before adding new stuff on top of it :).
@smarter I agree that would be a fascinating experiment, and very helpful in determining the right approach here. I think such an experiment should be done with both the tupling and the trailing identity map elimination (which seems the most likely to be innocuous). If -strict already has sane semantics for non-falsifiable patterns in for, then I’m quite happy on that front.
We should bite the bullet and make the assumption that tuples can be stuffed into whatever we're mapping over.
I do not understand what you are proposing. The code example you show is already desugared in the way you suggest. You can check it as follows:
Welcome to Scala 2.13.1 (OpenJDK 64-Bit Server VM, Java 1.8.0_121).
Type in expressions for evaluation. Or try :help.
scala> scala.reflect.runtime.universe.reify {
| for {
| x <- List(1, 2, 3)
| y = x * x
| if y * x > 4
| z <- 0 until y
| } yield y + x + z
| }
res0: reflect.runtime.universe.Expr[List[Int]] =
Expr[List[Int]](List.apply(1, 2, 3).map(((x) => {
val y = x.$times(x);
Tuple2.apply(x, y)
})).withFilter(((x$1) => x$1: @unchecked match {
case Tuple2((x @ _), (y @ _)) => y.$times(x).$greater(4)
})).flatMap(((x$2) => x$2: @unchecked match {
case Tuple2((x @ _), (y @ _)) => Predef.intWrapper(0).until(y).map(((z) => y.$plus(x).$plus(z)))
})))
The only meaningful difference with your desugaring is the use of withFilter
instead of filter
.
What do you mean by "the current assumptions around withFilter
" when you say "this is no more problematic than the current assumptions around withFilter
, and considerably easier to explain"?
There's no particular reason that this cannot be desugared into the following:
List(1, 2, 3).filter(x => (x * x) * x > 4) flatMap { x => val y = x * x 0.until(y).map(y + x + _) }
Actually, there are very good reasons not to do that: not only do you duplicate code (what if we had big_expression(x)
instead of x * x
?), but more importantly you also duplicate runtime work: what if you had expensive_computation(x)
instead of x * x
?
I do not understand what you are proposing. The code example you show is already desugared in the way you suggest. You can check it as follows:
Cool, so let's just remove withFilter
then. If tuples are already the mechanism in play here then I don't see a reason to change it.
What do you mean by "the current assumptions around withFilter" when you say "this is no more problematic than the current assumptions around withFilter, and considerably easier to explain"?
Simply referring to the fact that, in order to participate in if
within a for
comprehension, types must currently define withFilter
, with its own type foibles and such.
Actually, there are very good reasons not to do that: not only do you duplicate code (what if we had big_expression(x) instead of x x?), but more importantly you also duplicate runtime work: what if you had expensive_computation(x) instead of x x?
As I said, effects. Expensive work in this case is effectively a sort of effect. The point being that people expect that val
bindings immediately prior to a filter are not repeatedly evaluated. That's a weird expectation, but it is what they expect, so it's fair to treat it as a constraint.
What I'm trying to avoid in all of this is a .empty
method on values, which frankly doesn't make any sense (neither would an analogous point
method) and would pollute the public APIs of any type which wants to enable if
within for
.
I'm abolutely all for removing the trailing map when the yielded expression is a value of the previous expression – the trailing map makes no sense at all, it's harmful AND there's no way to abuse it meaningfully even if you wanted to. (It can't be used as a "finalizing" symbol for a builder because you can't distinguish between the final map and all the other maps in the middle of the expression.)
Cool, so let's just remove
withFilter
then. If tuples are already the mechanism in play here then I don't see a reason to change it.
What is your rationale for removing withFilter
? For comprehensions with guards desugar to calls to withFilter
, there is nothing surprising or problematic about that. Changing it to calling filter
will not improve anything; it will just break existing code.
types must currently define
withFilter
, with its own type foibles and such
What foibles and such?
people expect that
val
bindings immediately prior to a filter are not repeatedly evaluated. That's a weird expectation
Why is that a weird expectation? Do you know of any other language that have for
constructs which implicitly re-executes code because it's before a guard? Even Haskell does not do that.
Do you know of any other language that have
for
constructs which implicitly re-executes code because it's before a guard? Even Haskell does not do that.
Well, in Haskell, purity ensures you can't tell — so that's up to the optimizer.
Yes, the GHC optimizer could duplicate runtime work if it wanted, but it generally tries very hard not to, because this could have very bad consequences on program performance — for example, see the comments in this paper about not floating let bindings into lambdas (which is similar to what was proposed here). I am not aware of GHC optimizations which duplicate runtime work, and users can safely assume that their expensive computations will not be done several times. This is even the original motivations for the monomorphism restriction.
And the Haskell language reference defines the semantics of list comprehension (on which monad comprehension is based, which is the closest thing to Scala's for
with guards) as not duplicating the work of guards.
What is your rationale for removing withFilter? For comprehensions with guards desugar to calls to withFilter, there is nothing surprising or problematic about that. Changing it to calling filter will not improve anything; it will just break existing code.
It considerably complicates the definition of what if
does within for
. You can't just say "it uses filter
", because it doesn't, and the only reason it doesn't is to magically enrich strict datatypes with non-strict semantics for this one specific use-case.
This kind of "eager, except when magically lazy, but only right here" type of thing has been tried and I think is broadly considered to be a failed experiment by the ecosystem. Maybe I'm being a bit bold in asserting that, but flip-flopping between runtime semantics is something that most of the "pure FP Scala" ecosystem avoids very aggressively because it's just hard to reason about. Eager datatypes are eager, lazy datatypes are lazy, and the only exceptions to this are very, very, very carefully reasoned and justified (e.g. the ifM
operator in Cats), and tend to not affect the data itself.
In other words, I think the expectation of laziness even when the underlying datatype is eager is in and of itself a problem.
I mean, there are more important windmills to tilt, so if this is going to become a Thing™, then I'll just drop it. Removal of the trailing map along with the -strict
mode for patterns resolves 90% of my objections to the current semantics. Compared to those, withFilter
is quite ugly and (IMO) unnecessary, but not so much in the way.
What foibles and such?
You can just define withFilter
to be an alias for filter
and it, broadly-speaking, works. If you follow the example from the collections though you end up with this weird builder type off to the side.
Why is that a weird expectation? Do you know of any other language that have for constructs which implicitly re-executes code because it's before a guard? Even Haskell does not do that.
Haskell has uniformly by-need semantics. This means that at the very least, all of its data structures are lazy, meaning that filter
is lazy by definition, meaning that work isn't going to be repeated in any case. Scala has a more complex problem here because it mixes eager and lazy evaluation within a syntactically uniform framework, which can in turn mean that the semantics of higher level constructs (such as for
) can be radically different depending on which semantic is underneath.
It considerably complicates the definition of what
if
does withinfor
. You can't just say "it usesfilter
", because it doesn't
Well, you can just say "it uses withFilter
"... because it does.
I don't think withFilter
is particularly confusing or hard. It's conceptually similar to .view.filter
in the new collections infrastructure, and can be explained as such.
flip-flopping between runtime semantics is something that most of the "pure FP Scala" ecosystem avoids very aggressively because it's just hard to reason about. Eager datatypes are eager, lazy datatypes are lazy
The fact that withFilter
is lazy is not a problem in terms of semantics. I believe that this laziness cannot be observed, and cannot cause surprises.
The ironic thing is that the above holds precisely because of our pesky terminating map
call, which always forces evaluation on strict types. Look at the type of List(1,2,3).withFilter(_ => true).map(identity)
, which is List[Int]
. This is why in practice we don't normally "end up with this weird builder type off to the side".
So it seems we're getting to the bottom of the problem. I would agree with you if you made the following point: using withFilter
is problematic if we decide to get rid of the terminating map
.
I don't know if it would be feasible to change for
comprehension to the extent this suggests, without breaking and degrading the performance of too much existing code.
Maybe an alternative comprehension syntax could be used, like do { a <- ... ; b <- ... } yield e(a, b)
, which could be type-class-based and incorporate other improvements proposed in the contributors thread.
Haskell has uniformly by-need semantics. This means that at the very least, all of its data structures are lazy, meaning that filter is lazy by definition
(A nitpick, but this is just plain false. Haskell is non-strict, and has plenty of ways to control the amount of strictness you want in your programs, both in the language itself and in the compiler options — see: bang patterns, seq
, strict data types...)
meaning that work isn't going to be repeated in any case
What we are talking about here is whether or not to duplicate code, which is what you originally suggested. Of course, duplicating code is going to duplicate runtime work in the situation at hand, even in Haskell.
Maybe an alternative comprehension syntax could be used
Inventing a new comprehension syntax from scratch, would give us an opportunity to learn from successes in this area in other language, for example F# Computation Expressions. There the syntax for monadic bindings is very similar to normal one, you just append the !
character. For example, let x = y
would be Scala's val x = y
and let! x = y
would be x <- y
. And there are also features which currently don't have a Scala counterpart, like do! x
, which would be something like (): Unit <- x
, or use! x = y
which would be quite useful with cats.effect.Resource
, or match!
and many others. The details would probably be for a longer discussion elsewhere, but you can at least check out the linked website.
there is a new SIP in this area: https://github.com/scala/improvement-proposals/pull/79
Opening this issue, as suggested by Martin, to provide a place to discuss the individual warts brought up in the blog post Warts of the Scala Programming Language and the possibility of mitigating/fixing them in Dotty (and perhaps later in Scala 2.x). These are based on Scala 2.x behavior, which I understand Dotty follows closely, apologies in advance if it has already been fixed
Scala lets you write for-comprehensions, which are converted into a chain of
flatMap
s anmap
s as shown below:I have nicely formatted the desugared code for you, but you can try this yourself in the Ammonite Scala REPL to verify that this is what the for-comprehension gets transformed into.
This is a convenient way of implementing nested loops over lists, and happily works with things that aren't lists:
Option
s (as shown above),Future
s, and many other things.You can also assign local values within the for-comprehension, e.g.
The syntax is a bit wonky (you don't need a
val
, you can't definedef
s orclass
es or run imperative commands without_ = println("debug")
) but for simple local assignments it works. You may expect the above code to be transformed into something like thisBut it turns out it instead gets converted into something like this:
Although it is roughly equivalent, and ends up with the same result in most cases, this output format is tremendously convoluted and wastefully inefficient (e.g. creating and taking-apart unnecessary
Tuple2
s). As far as I can tell, there is no reason at all not to generated the simpler version of the code shown above.PostScript:
Notably, these two desugarings do not always produce the same results! The current desugaring behaves weirdly in certain cases; here is one that just bit me an hour ago:
This specific problem has gone away in 2.12 because
Either
doesn't need.right
anymore, but the language-level problem is still there:y = 2
ends up causing strange, difficult-to-debug errors due to the weirdness of the desugaring. This would not be an issue at all given the desugaring I proposed.