typelevel / fs2

Compositional, streaming I/O library for Scala
https://fs2.io
Other
2.36k stars 599 forks source link

Memory leaks possibly related to Topic behaviour #2408

Closed prova closed 3 years ago

prova commented 3 years ago

While porting a large codebase to CE3 and friends, we can see some memory leaks occurring. By the way, thanks for fixing an earlier one reported by @svalaskevicius in fs2 3.0.3.

The present issue has possibly a common root in the Topic behaviour. @karimak123 has come up with this reproducing example that leaks quite heavily:

import cats.effect.IOApp
import cats.effect.{ExitCode, IO}
import fs2.{Stream, Pipe}
import scala.concurrent.duration._
object Hello extends IOApp {
  override def run(args: List[String]): IO[ExitCode] =
    Stream
      .constant(())
      .broadcastThrough(pipe)
      .compile
      .drain
      .as(ExitCode.Success)
  def pipe: Pipe[IO, Unit, Unit] = _.map(identity)
}

Further to that, the pattern that involves consuming data from an http4s endpoint with ServerSentEvent data, decoding it, and then publishing to a Topic and subscribing to that also results in a leak. The curious thing about this latter one is that when we substributed the actual stream consumption with a mock repeat stream, the leak does not seem to be manifest. However, the Topic seems to be the common theme, as substituting it with a CE3 Queue removes the leak as well.

xjurko commented 3 years ago

Had a go at running bisect:

16e2d80f2ae2ff032de114c4ae468599863ab1a6 is the first bad commit
commit 16e2d80f2ae2ff032de114c4ae468599863ab1a6
Date:   Tue Dec 29 21:40:26 2020 +0000

    Pull - Scope: rewrites

    We add to Scope the following methods:
    - A method to ask if this is the Root Scope.
    - A method to get the depth (distance to the root scope)
    - A method to search either a descendent or ancestor scope.

    Pull: rewrite `goCloseScope` method to reduce code nesting:
    - Split from `closeAndGo` the logic to get the Result
    - Use cases with guards for the choices.
    - Inline top-level parts of `closeAndGo` method.

:040000 040000 27e363730182d319f8e4dc8c4c609db7bf56ddef 4f127bd029d5be102e72e71c1a0b2625fa4e95db M      core
diesalbla commented 3 years ago

Hmmm It is a bit strange if that is the first bad commit. From taking a new look, i cannot see anything in the changes of that commit that would introduce a memory leak. For easy inspection, look at changes without whitespace changes.

@xjurko Thanks anyway for the help with bisecting. I will take a look when I have some time.

xjurko commented 3 years ago

HI @diesalbla had another go with bisect just to confirm and turns out the previous one wasn't the first bad commit. Looks like first bad commit was actually few commits earlier:

e25a6e1c59ec77b4c60e97731d1b152aec1c068b is the first bad commit
commit e25a6e1c59ec77b4c60e97731d1b152aec1c068b
Date:   Tue Dec 29 00:41:57 2020 +0000

    Pull - compilation changes...

    - WE remove the use of the `BuildR` subclass of Run, as well as the
      continuation-like `CallRun` type alias.

    - We remove the use of `F.attempt` after recursive calls to `go`,
      since errors are now also processed within the

    - For the ViewRunner, to avoid a stack overflow in one of the texts,
      we modify the method `out` to include a manual unrolling loop.

:040000 040000 86ff41554bab16a64dd84aa9ca8616b240cb5878 27e363730182d319f8e4dc8c4c609db7bf56ddef M      core

I checked out and ran manually again a few times to verify and rule out flakiness

e3fb8b64b - passes e25a6e1c5 - fails

Sorry for misidentifying this the first time

mpilquist commented 3 years ago

~@diesalbla Is the rewrite of this if/else really equivalent? https://github.com/typelevel/fs2/commit/16e2d80f2ae2ff032de114c4ae468599863ab1a6?w=1#diff-725e3e96f7e12145daebb18e95a1912acdcd09b96702d3d84c7507a9683de1d1L879~

~Looks to me like the case where it's root & extendedTopLevelScope is defined now returns None instead of Some. I haven't tried running the tests with that change.~

mpilquist commented 3 years ago

That's the same commit which was (partially) rolled back in e25a6e1c59ec77b4c60e97731d1b152aec1c068b

diesalbla commented 3 years ago

That's the same commit which was (partially) rolled back in e25a6e1

Yeah... after that commit there was a first partial rollback a few months ago, and another one earlier this week. There may be a last part.

I would be curious as to why removing that "closure" runner results in memory leaks... clearly something is being inadvertently retained that should not be.

xjurko commented 3 years ago

Hiya noticed there has been some progress on this thanks. Just to confirm, unfortunately, the leak seems to still be present and the test case mentioned above fails on the main branch. Ran this multiple times on separate machines to rule out any flakiness. Ran with java11.

mpilquist commented 3 years ago

Both of the failing tests have long chains of BindBind objects on the heap:

image
mpilquist commented 3 years ago

I bisected the "hung merge" failure -- started failing here: ef57231b393a2f49e812594183fab47a81b29e3e

Note this doesn't necessarily point to the problem being Channel -- could just be that Channel has a different pattern of map/flatMap calls that is exposing an interpreter leak.

➜  git bisect log
git bisect start
# good: [e25a6e1c59ec77b4c60e97731d1b152aec1c068b] Pull - compilation changes...
git bisect good e25a6e1c59ec77b4c60e97731d1b152aec1c068b
# good: [e25a6e1c59ec77b4c60e97731d1b152aec1c068b] Pull - compilation changes...
git bisect good e25a6e1c59ec77b4c60e97731d1b152aec1c068b
# bad: [fb1b6f93e8e88b087c69d750e1efc35bebe5a399] Merge pull request #2410 from diesalbla/fix_topic_leak
git bisect bad fb1b6f93e8e88b087c69d750e1efc35bebe5a399
# good: [3108a9ea5a58c1d10d2d85d0b4cd1119e7153a15] broadcastThrough is now safe when a pipe interrupts
git bisect good 3108a9ea5a58c1d10d2d85d0b4cd1119e7153a15
# good: [3108a9ea5a58c1d10d2d85d0b4cd1119e7153a15] broadcastThrough is now safe when a pipe interrupts
git bisect good 3108a9ea5a58c1d10d2d85d0b4cd1119e7153a15
# bad: [84b03c3fe5ac8cd0166a6da515d00d0a065adc93] Merge pull request #2327 from DGolubets/faster-group-within
git bisect bad 84b03c3fe5ac8cd0166a6da515d00d0a065adc93
# bad: [84b03c3fe5ac8cd0166a6da515d00d0a065adc93] Merge pull request #2327 from DGolubets/faster-group-within
git bisect bad 84b03c3fe5ac8cd0166a6da515d00d0a065adc93
# bad: [41abf2438b362647e1496a5c2ed040b82b6938c3] Broadcast problem: draining the channel doesn't unblock producer
git bisect bad 41abf2438b362647e1496a5c2ed040b82b6938c3
# skip: [1f877362decfae6c1d60bd4bebe5373ac95d635f] Channel - debounce
git bisect skip 1f877362decfae6c1d60bd4bebe5373ac95d635f
# good: [2cafb45ad3edb457bf521c139ad48840a45465e4] Merge pull request #2316 from mpilquist/topic/remove-queue
git bisect good 2cafb45ad3edb457bf521c139ad48840a45465e4
# bad: [749879c5e0fc755275a89afba79ebf23b1c86219] Use a Channel.bounded(1) instead of Channel.synchronous in parJoin
git bisect bad 749879c5e0fc755275a89afba79ebf23b1c86219
# good: [a97a1db6d69f67f4a3198355744bfa98682dfb92] Formatting
git bisect good a97a1db6d69f67f4a3198355744bfa98682dfb92
# bad: [b470d74d49fc4ff8d9071162c088449e215541c0] add TODO for observeAsync on channel
git bisect bad b470d74d49fc4ff8d9071162c088449e215541c0
# bad: [ee3889c713b2fbc98b779630b2664dbb01a17f3b] Channel - parEvalMap
git bisect bad ee3889c713b2fbc98b779630b2664dbb01a17f3b
# good: [b8a1a82337995e3219ee60a4fb2fd4b84a5f093b] Add synchronous and unbounded channels
git bisect good b8a1a82337995e3219ee60a4fb2fd4b84a5f093b
# bad: [ef57231b393a2f49e812594183fab47a81b29e3e] Channel - merge
git bisect bad ef57231b393a2f49e812594183fab47a81b29e3e
# only skipped commits left to test
# possible first bad commit: [ef57231b393a2f49e812594183fab47a81b29e3e] Channel - merge
# possible first bad commit: [1f877362decfae6c1d60bd4bebe5373ac95d635f] Channel - debounce
mpilquist commented 3 years ago

Fixed by #2412

mpilquist commented 3 years ago

Released in https://github.com/typelevel/fs2/releases/tag/v3.0.4