sbt / zinc

Scala incremental compiler library, used by sbt and other build tools
Apache License 2.0
334 stars 120 forks source link

New analysis format #1326

Closed szeiger closed 6 months ago

szeiger commented 8 months ago

A new implementation of Zinc's incremental state serialization.

Benchmark data based on scala-library + reflect + compiler:

Write time Read time File size
sbt Text 1002 ms 791 ms ~ 7102 kB
sbt Binary 654 ms 277 ms ~ 6182 kB
ConsistentBinary 157 ms 100 ms 3097 kB
ConsistentBinary (unsorted) 79 ms ~ 3796 kB

This PR makes the new format available via the new ConsistentFileAnalysisStore. It does not replace the existing formats (but it should; it's a better choice for almost every use case).

We have been using iterations of this format internally over the last few months for the Bazel build (with our own Zinc-based tooling) of our two main monorepos totaling about 27000 Scala (+ Java/mixed) targets ranging in size from a few LOC to almost 1 million LOC.

szeiger commented 8 months ago

The 3 large binary files clearly don't belong into this repo but I don't know what's the right place for them. They are used by the integration tests and the benchmarks. In our internal repo we keep such files in S3 instead of checking them into git.

Friendseeker commented 8 months ago

The 3 large binary files clearly don't belong into this repo but I don't know what's the right place for them. They are used by the integration tests and the benchmarks. In our internal repo we keep such files in S3 instead of checking them into git.

Maybe we can store them via Git LFS?

I have never used Git LFS, so ideally people with prior experience with it can judge if this is indeed a good idea.

szeiger commented 8 months ago

I don't understand why the benchmarks are failing in CI. I added the new class to the existing benchmark project so it should be found. I was not able to reproduce the problem locally. The benchmark commands from CI run just fine on my machine.

szeiger commented 8 months ago

Oh, it's the "benchmark against develop branch" step that fails. This seems to be the usual undercompilation bug of sbt-jmh. It usually needs an explicit clean when you remove a benchmark. Switching to the develop branch after building from this PR would cause it. This problem should go away after merging, but explicitly cleaning the jmh project would be a better fix.

He-Pin commented 8 months ago

How about : https://github.com/HebiRobotics/QuickBuffers and https://github.com/google/flatbuffers

For the formats? seems fast too.

lihaoyi-databricks commented 6 months ago

bump @eed3si9n, are there any blockers from merging this? We've been using this internally with good success, and expect it would provide a lot of value to the broader community to have this upstreamed for SBT and Mill and other build tools to use

eed3si9n commented 6 months ago

bump @eed3si9n, are there any blockers from merging this? We've been using this internally with good success, and expect it would provide a lot of value to the broader community to have this upstreamed for SBT and Mill and other build tools to use

Overall I am onboard with more stable / hermetic analysis serializer. My main line of concern was where the speedup gain was coming from (ExecutionContext.global) and if it would end up competing for CPU attention in non-Bazel usage. Now that ExecutionContext is a parameter, hopefully we can tweak how much thread we assign to this vs other tasks.

lrytz commented 5 months ago

@eed3si9n would you have time to do a 1.10.0-M4 release that includes this PR?

eed3si9n commented 5 months ago

Yea. I'll try to get something out this weekend.

eed3si9n commented 5 months ago

https://github.com/sbt/zinc/releases/tag/v1.10.0-RC1 is on its way to Maven Central.