oyvindberg / bleep

A bleeping fast scala build tool!
MIT License
145 stars 21 forks source link

Deleting `bloop-rifle` from the coursier cache breaks compilation with bleep #355

Open SlowBrainDude opened 1 year ago

SlowBrainDude commented 1 year ago

Steps to reproduce:

This "fixes itself" in case you delete the .bleep folder in the project directory. Bleep will than redownload an appropriate bloop-rife and use it on next compile.

Still this is a bug as deleting anything form the $XDG_CACHE_HOME should not break any app.

In case the data is meant to be persistent it should go to likely to $XDG_DATA_HOME. (Actually not sure the coursier cache is really a cache according to the spec, as it will need internet access to regenerate; but this would be an issue for the coursier project anyway)

SlowBrainDude commented 1 year ago

I'm in the process of investigating a fix myself. But I'm very new to this codebase, so no quick result is to be expected.

oyvindberg commented 1 year ago

I'm not sure what to think of this, honestly.

Feel free to dig into this if you want, but the solution would probably be to check all jars from coursier at every bleep invocation. And that, that is too slow :D

SlowBrainDude commented 1 year ago

What do other projects do in such situation?

I've deleted the coursier folder not only once while debugging things. Never had an issue until now that looked like the one here (where you get weird compilation errors, I didn't first understand even).

Also: The missing jars get re-downloaded when you delete the .bleep folder. So there are some checks somewhere I guess.

In case this is a general problem this would be of course something for upstream (coursier). A cache is a cache. Only non-essential files belong there as per spec. If stuff breaks hard when deleting the cache this stuff doesn't belong into the cache dir in the first place. There is the data dir for things that need to be keep persistent.

Think for example of backup restores. You wouldn't backup the cache dir… But you would expect that after a restore your stuff still works (maybe with some re-downlaoding needed, but it should not break hard, starting to throw new funny errors).

oyvindberg commented 1 year ago

I'm sympathetic to your argument, but we're touching on a crucial performance optimization here.

Context

Say you have a build with some projects. What bleep essentially does to it is to use coursier and some other data transformations to rewrite that build to bloop json files with all the jars already resolved. Then the running bloop server will reload changed bloop json files and update its internal build model.

This is too slow to do on each boot, since my performance target is less than 10 ms when loading an unchanged build.

So what happens instead is that the bleep.yaml file is read at boot, and digested. If the digest matches last digest, then bleep doesn't do any further work. This logic can be found here https://github.com/oyvindberg/bleep/blob/master/bleep-core/src/scala/bleep/GenBloopFiles.scala#L108

So that digest file lives within .bleep (pre.buildPaths.digestFile in the code).

What to do?

The coursier directory is widely assumed to be append-only, so it's fine that things break if you break that assumption.

What I consider unfortunate here is that it's impossible to guess that deleting the .bleep folder will fix it. So would a bunch of other actions by the way, like changing the bleep.yaml file by changing branches or editing it, recloning the repo, compiling another bleep or scala-cli build which happens to trigger the download of the missing bloop-rifle.

I think maybe that's where we should focus some effort. Maybe run some detection code after observing a given error, mabye a bleep doctor something like that.

SlowBrainDude commented 1 year ago

The coursier directory is widely assumed to be append-only, so it's fine that things break if you break that assumption.

That's not "fine". That's a bug. The spec says:

"There is a single base directory relative to which user-specific non-essential (cached) data should be written. This directory is defined by the environment variable $XDG_CACHE_HOME."

https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html

The spec is like that for a reason, so you can delete the cache at any time. Nothing should break.

I don't say this is necessary a bleep bug. Maybe coursier is wrong here…

Maybe run some detection code after observing a given error, mabye a bleep doctor something like that.

This was also my idea. If the build fails for missing stuff in the cache the tool should try to fix that on its own.

If this cost a few milliseconds that's irrelevant. Bleep is anyway so bleeping fast that coursier starts to download nonexistent trash already while you're typing in the build config yaml… One will need to add some delays to MetaLS likely to not instantly pick up a freshly imported builds as bleep imports them just too fast…

Of course, if your only priority is some measure of performance, at all costs, even in case this affects correctness, my interest in this tool would end instantly. I don't need tools that aren't robust; especially in the corner cases.

I've spent some time debugging the issue. I was first thinking that's still fallout from the bloop-rifle update. Took some time to realize that bleep builds just break when you delete stuff from the cache and this has nothing to do with bloop-rifle as such.

This doesn't mean that I don't care about speed. This was the thing that drove me here. Bleep is amazing because it's so fast!

But this would not change even if it was 100% slower… We're talking about a few milliseconds! Single digit milliseconds… You could not even feel any difference. The lag of your keyboard is much bigger!

For reference:

https://danluu.com/input-lag/

oyvindberg commented 1 year ago

Ok, so I'll start of repeating that I agree to work towards some kind of solution here.


That said, I believe you're holding bleep to a very high standard in this case. I don't particularly believe that common usage of the coursier folder corresponds to that spec.

bloop-rifle is a key library to communicate with bloop, which bleep uses for pretty much everything. Bleep does manage the installation of the JAR itself, since the library is compiled into the native build as long as you can stay in that world (what I mean by this is that bleep core can and is run both from native and from JVM).

If you delete crucial libraries from any other application - can you name a single one which responds by redownloading the library and then keeps working?

Given the context you've given above it's tempting to ask coursier to download core libraries like this into more stable storage. maybe that's even what we'll end up with.

But this would not change even if it was 100% slower… We're talking about a few milliseconds! Single digit milliseconds… You could not even feel any difference. The lag of your keyboard is much bigger!

It's also a long discussion, but it's far from this simple. For instance: Loading the build (and thus potentially coursier resolving and reloading of bloop files) can also be triggered from a cold JVM. This is the case when you add sourcegen to your build or when running scripts. Doing potentially thousands of IO operations multiple times is disastrous, especially if you're already pushing your machine a bit. bleep has to be fast in those contexts too.

SlowBrainDude commented 1 year ago

That said, I believe you're holding bleep to a very high standard in this case.

I'm sorry if this looks like that. Bleep is cool, but it can't do everything on its own (currently?) so whatever the dependencies do we here are affected. That's why I'm saying I'm not even sure this is a bleep issue…

I've just encountered this issue in bleep, so this is my first stop.

I don't particularly believe that common usage of the coursier folder corresponds to that spec.

This may be the core of the issue.

I've tried to make clear that I'm not sure who to blame here. It can be very well that coursier is the offender not respecting specs!

If you delete crucial libraries from any other application - can you name a single one which responds by redownloading the library and then keeps working?

Normal applications don't put crucial libraries into some cache folders that may get deleted at any time… :wink:

But let's make some fair comparisons. I'm going to investigate how sbt, mill, scala-cli, and rustup behave here. That's the competition (or reference, or however you want to call it).

If things break the same for the other Scala tools, I guess I'll close this ticket and will go complain to coursier.

The other thing that I wanted to investigate was actually to see how the newest iteration of JavaScript package managers does things. (Especially yarn "blueberry"). They have also some kind of global cache; and do a lot of smart things on top to keep things fast, even if your cache has tens of thousand of entries, and projects use subsets of thousands of them in their local build.

But this could end up in reworking how coursier works. Maybe even a new tool would be needed. So I don't think anything will happen quickly.

It's also a long discussion, but it's far from this simple. For instance: Loading the build (and thus potentially coursier resolving and reloading of bloop files) can also be triggered from a cold JVM. This is the case when you add sourcegen to your build or when running scripts. Doing potentially thousands of IO operations multiple times is disastrous, especially if you're already pushing your machine a bit. bleep has to be fast in those contexts too.

First of all: Bleep needs to stay fast, as this is one of the important sales pitches. I fully agree.

I hate how slow modern software is. Computers are ultra fast, but our software is laughable slow, and this gets worse every day—because people just don't care.

But there are for sure solutions that keep things fast, but also work correctly. I'm sure there are…

As you don't think this issue is of priority for you feel free to just ignore it for the time being. :smile:

Like said, I will experiment a little bit with the competing Scala tools, and with the tools from other languages where one could take inspiration from. Let's see what I find out. (In case there is "nothing wrong" with bleep, and it's coursier not respecting reasonable specs, I'll just close this issue).

Once more, thanks for your patience!