scala / bug

Scala 2 bug reports only. Please, no questions — proper bug reports only.
https://scala-lang.org
232 stars 21 forks source link

LazyList.isEmpty forces tail #11984

Closed basdebakker closed 4 years ago

basdebakker commented 4 years ago

I am trying to replace usages of the deprecated Stream with the recommended LazyList. But my code is now failing because it turns out that calling isEmpty on a LazyList actually forces the tail.

reproduction steps

Compile and run this code with Scala 2.13.2:

object StreamTest {

  def main(args: Array[String]): Unit = {
    println(stream2.isEmpty)
    println(lazyList2.isEmpty)
  }

  def stream1: Stream[Int] = {
    println("stream1")
    Stream(1)
  }
  def stream2: Stream[Int] = 2 #:: stream1

  def lazyList1: LazyList[Int] = {
    println("lazyList1")
    LazyList(1)
  }
  def lazyList2: LazyList[Int] = 2 #:: lazyList1
}

problem

The output is the following, which shows that the tail argument of the LazyList is called, which is not the case for Stream.

false
lazylist1
false

expectation

I expect the behavior of LazyList to be at least as lazy as Stream.

SethTisue commented 4 years ago

great catch!!

@NthPortal have time to take a look?

sjrd commented 4 years ago

I think that is expected, because the test is overly strict.

When we evaluate whether a lazy list is empty or not, we evaluate its head, and we create a new instance of LazyList for its tail. However, that LazyList is completely unevaluated: we don't know whether it's empty or not, and we most definitely haven't computed its head.

The test, as written, insists on not wanting the instance of LazyList to be created. But that's too strict. In practice what we want is that the state of the tail hasn't been computed yet.

SethTisue commented 4 years ago

ah, I believe you are right, thanks for thinking about this more carefully than I did

even after understanding what's going on, though, "I expect the behavior of LazyList to be at least as lazy as Stream" still nags at my intuition somewhat. but I guess it's my intuition that needs to fall in line

SethTisue commented 4 years ago

fwiw, this (the delay in the println happening) also goes against what my intuition was:

scala 2.13.2> { println("hey!"); 0 } #:: LazyList()
val res9: scala.collection.immutable.LazyList[Int] = LazyList(<not computed>)

scala 2.13.2> res9.isEmpty
hey!
val res10: Boolean = false

but perhaps that's because I used Stream for a decade before LazyList came along, and so my intuitions developed in a certain direction

basdebakker commented 4 years ago

The problem is that I can't construct the LazyList instance for the tail yet. My version of the stream1 method looks more like

if (condition(args)) head #:: recursiveCall(newArgs) else Stream.empty

and it is the evaluation of condition that I also need to postpone, but is now evaluated when I replace Stream by LazyList.

sjrd commented 4 years ago

Hum there should be a way to first create a LazyList with a function that delays the moment where you have to do the if (condition(args)). That's the one true benefit of the new design. That said just looking at the public API, I don't see a way to do that in user space. @SethTisue @NthPortal What would be the user-space to do that?

Otherwise, you might be able to achieve your end goal with LazyList.unfold.

NthPortal commented 4 years ago

Hum there should be a way to first create a LazyList with a function that delays the moment where you have to do the if (condition(args))

Otherwise, you might be able to achieve your end goal with LazyList.unfold.

unfold is the intended way to defer the evaluation of the condition; or more precisely, to defer the evaluation of whether or not it is empty. Alternatively, you can create an Iterator that produces the elements and call from with it as the arg, though this may be a bit more work. Obviously, #:: (and #:::) cannot be used to defer the evaluation of whether or not it is empty, as it only produces non-empty LazyLists.

Like @sjrd, I am unsure whether or not this is a bug. It is fixable though (I've already come up with a fix) if we decide it needs to be fixed.


The fix, for reference, is to change LazyList.Deferrer.#:: to this:

def #:: [B >: A](elem: => B): LazyList[B] = newLL(sCons(elem, newLL(l().state)))
NthPortal commented 4 years ago

sadly, no one thought of a when method for the LazyList companion, and we can't add one now :(

NthPortal commented 4 years ago

@basdebakker as Seb said, unfold does pretty much what you want, without too much restructuring.

LazyList.unfold(args) { args =>
  if (condition(args)) Some((head, newArgs)) else None
}

alternatively, if you want to keep your current structure, you can use the following workaround hack of inserting a LazyList.empty #:::.

if (condition(args)) head #:: LazyList.empty #::: recursiveCall(newArgs) else LazyList.empty
// or
LazyList.empty #::: { if (condition(args)) head #:: recursiveCall(newArgs) else LazyList.empty }
basdebakker commented 4 years ago

Thanks. I rewrote my code (which is a bit more complex) using LazyList.unfold and it works for me. I'll even say it is now better than it was before.

Still, my old code was a common pattern and from what I read I had expected to use LazyList as a drop-in replacement for Stream.

NthPortal commented 4 years ago

After some thought, I don't think it's a bug, for the following reason: on the first iteration, the condition is not evaluated lazily (which is something LazyList supports), so the pattern isn't properly lazy in the first place; consequently, you shouldn't expect it to be properly lazy as a tail either.

Thoughts?

SethTisue commented 4 years ago

question for @basdebakker:

"from what I read I had expected to use LazyList as a drop-in replacement for Stream"

is there some documentation we ought to improve? https://github.com/scala/scala/releases/tag/v2.13.0 currently says "immutable.LazyList replaces immutable.Stream. Stream had different laziness behavior and is now deprecated", with links to PRs

two questions for @NthPortal:

sadly, no one thought of a when method for the LazyList companion, and we can't add one now"

what would the signature of that be...?

The fix, for reference, is to change LazyList.Deferrer.#:: to this:

def #:: [B >: A](elem: => B): LazyList[B] = newLL(sCons(elem, newLL(l().state)))

I see that the current definition is:

def #:: [B >: A](elem: => B): LazyList[B] = newLL(sCons(elem, l()))

what would the downside be of applying this change?

NthPortal commented 4 years ago

sadly, no one thought of a when method for the LazyList companion, and we can't add one now"

what would the signature of that be...?

I'll do you one better, and give you an implementation with it:

def when[A](condition: => Boolean)(deferred: => LazyList[A]): LazyList[A] =
  newLL {
    if (condition) deferred.state else State.Empty
  }

Could mirror Option and add an unless too.

Also potentially a useful method for this sort of thing:

/** Defers the computation of a block that returns a `LazyList` so that
 *  logic outside the `LazyList`'s creation is also lazy.
*/
def deferred[A](thunk: => LazyList[A]): LazyList[A] = newLL(thunk.state)

what would the downside be of applying this change?

The major downside is that it increases the computation unnecessarily for cases where the tail => LazyList is fully lazy/doesn't contain its own logic. It essentially wraps each tail in an extra layer of laziness that it didn't need, since it was already lazy.

Take for example the classic Fibonacci implementation:

val fibs: LazyList[Int] = 1 #:: 1 #:: fibs.zip(fibs.tail).map { n => n._1 + n._2 }

fibs.zip (i.e. LazyList#zip) is fully and properly lazy, so wrapping its state in a separate LazyList adds nothing. This is done for every single tail as well, which is a significant amount of unnecessary computation for LazyLists whose elements are already simple to compute (such as Fibonacci numbers).


side note: I really hope the compiler inlines newLL and sCons, because they make the code cleaner but don't actually do anything (update: it does)

Jasper-M commented 4 years ago

The test, as written, insists on not wanting the instance of LazyList to be created. But that's too strict. In practice what we want is that the state of the tail hasn't been computed yet.

Where would a regular user learn about what the difference is between a LazyList being created and "its state" being computed? And why it matters and how it's different from what Stream did?

basdebakker commented 4 years ago

question for @basdebakker:

"from what I read I had expected to use LazyList as a drop-in replacement for Stream"

is there some documentation we ought to improve?

My impression came about from reading the migration guide which states

This allows creating a LazyList without evaluating the head element. immutable.Stream, which has a strict head and a lazy tail, is deprecated.

Reading it again I admit it doesn't explicitly say so, but to me it seemed to mean that LazyList was "more lazy" than Stream. So when it turned out that my code became less lazy (as discussed above) I was surprised.

It also states

Stream is replaced with LazyList

without any qualification. If this difference in behavior is considered correct, I think the migration guide could use (a reference to) some more detailed explanation.

sjrd commented 4 years ago

The short answer for the differences between Stream and LazyList is the following:

So the LazyList is definitely more lazy than Stream for one aspect: knowing whether it's empty and what's it's head is lazy; it's not on Stream. And although the tail is not lazy per se, the instance behind the tail might itself be completely unknown yet, so in terms of the possible states, this is not less lazy than Stream either.

Of course the code that builds these things might need to be adapted to reap all those possible states.

martijnhoekstra commented 4 years ago

@sjrd would you mind if I used that as a base for a PR against LazyList's scaladoc (and, for the future lawyers reading along, could you let this comment fall under the contributions you have a CLA for?)

sjrd commented 4 years ago

Sure. You can consider all my comments in this thread as Contributions.

martijnhoekstra commented 4 years ago

Without a lazy constructor, it's actually really hard to give good advice on how to create a lazy tail. head #:: continually(thunk).take(1) seems really silly.

SethTisue commented 4 years ago

I don't claim to have fully understood all this, but (like Som) my urge is still to change just #:: while leaving everything else alone.

w/r/t changing #::, @NthPortal wrote:

This is done for every single tail as well, which is a significant amount of unnecessary computation

I'm not sure this is so bad, given that only code that explicitly uses #:: would pay the price — I assume we aren't using it internally, or if that we are, those internal call sites could be adjusted as needed?

NthPortal commented 4 years ago

I assume we aren't using it internally

correct - we only use newLL and sCons internally

my urge is still to change just #:: while leaving everything else alone.

I'm starting to feel that too. the core use case for #:: I believe is actually recursive methods, not recursively defined values. and as noted by @basdebakker, most of the time that includes some logic other than the creation of the LazyList.

I'll try to write a quick benchmark of before and after the extra wrapping

NthPortal commented 4 years ago

benchmark results for BigInt fibonacci LazyList to List:

Benchmark                          (takeCount)  Mode  Cnt    Score   Error  Units
LazyListConsBenchmark.lazier_#::            10  avgt   20    1.238 ± 0.008  us/op
LazyListConsBenchmark.lazier_#::           100  avgt   20   12.379 ± 0.107  us/op
LazyListConsBenchmark.lazier_#::          1000  avgt   20  140.522 ± 0.920  us/op
LazyListConsBenchmark.regular_#::           10  avgt   20    1.137 ± 0.018  us/op
LazyListConsBenchmark.regular_#::          100  avgt   20   12.683 ± 0.183  us/op
LazyListConsBenchmark.regular_#::         1000  avgt   20  132.274 ± 5.249  us/op

it's not clear that the loss from extra laziness is more than noise in this benchmark, and if it is, it doesn't seem super significant to me. I'll throw up a PR


Code ```scala @BenchmarkMode(Array(Mode.AverageTime)) @Fork(2) @Threads(1) @Warmup(iterations = 10) @Measurement(iterations = 10) @OutputTimeUnit(TimeUnit.MICROSECONDS) @State(Scope.Benchmark) class LazyListConsBenchmark { @Param(Array("10", "100", "1000")) var takeCount: Int = _ var initialA: BigInt = _ var initialB: BigInt = _ private def regularFibsFrom(a: BigInt, b: BigInt): LazyList[BigInt] = a #:: regularFibsFrom(b, a + b) // ##:: is the lazier version of #:: in this code private def lazierFibsFrom(a: BigInt, b: BigInt): LazyList[BigInt] = a ##:: lazierFibsFrom(b, a + b) @Setup(Level.Trial) def initKeys(): Unit = { initialA = BigInt(1) initialB = BigInt(1) } @Benchmark def regular_#:: : Any = { regularFibsFrom(initialA, initialB).take(takeCount).toList } @Benchmark def lazier_#:: : Any = { lazierFibsFrom(initialA, initialB).take(takeCount).toList } } ```
SethTisue commented 4 years ago

@sjrd would you mind if I used that as a base for a PR against LazyList's scaladoc

@martijnhoekstra that'd be 💯

SethTisue commented 3 years ago

there should be a way to first create a LazyList with a function that delays the moment where you have to do the if (condition(args))

addressing this might now be possible over at https://github.com/scala/scala-library-next