spotify / scio

A Scala API for Apache Beam and Google Cloud Dataflow.
https://spotify.github.io/scio
Apache License 2.0
2.55k stars 513 forks source link

RunnerContext.filesToStage filtering is too aggresive #3782

Closed gafiatulin closed 2 years ago

gafiatulin commented 3 years ago

Coursier's default cache location on Linux is ~/.cache/coursier/v1, which matches matchesEnvDir here. So all dependency jars are excluded.

Note: Cache location can be overridden.

There's another related problem which surfaces in the test command: When sbt's own Scala version matches the project's one scala-library.jar from sbt Boot directory is used in the class path. With default location of boot directory being $HOME/.sbt/boot/ matchesEnvDir matches as well and scala-library.jar is not staged.

This happens only when sbt's scala version matches project's, which is the case now for sbt 1.5.0 and scala 2.12.13. When versions don't align scala-library jar from coursier cache is used.

This doesn't happen in run/runMain because of bgCopyClasspath which copies class path for background run.

Moving sbt boot directory from ~/.sbt with sbt --no-share, sbt --no-global or sbt --sbt-boot dir works, but this also moves all sbt jars (zinc-classpath, compiler-bridge, etc.) as well, defeating the whole purpose of filtering.

Minimal reproducible example: https://github.com/gafiatulin/scio-sbt-classpath-issue

Maybe this (scala-library reuse in test part) should be raised in sbt/sbt instead?

For now, as a workaround, have to override both Coursier cache location and boot directory, and stage sbt jars.

kellen commented 2 years ago

Fixed in #4016, #4262

steveniemitz commented 2 years ago

Bazel puts its files in ~/.cache/bazel as well, which is still filtered out.