Maybe don't build if there's too much to build?

michaelpj commented 4 years ago

Feature description

I'm not entirely sure how to phrase what I want here, so let me describe a situation I end up in sometimes.

I have some generated Nix in my repository (Haskell packages). Updating this causes a lot of rebuilds. Typically what I do is push my branch and let our CI system build it, and then later get it from the cache.

However, if I have lorri running, then lorri will immediately try to rebuild everything, which is sort-of fine, except that it uses a noticeable amount of system resources.

It would be nice if this somehow didn't happen. I don't have a great suggestion. Maybe lorri could have a rebuild threshold, and not build above that?

Obviously the user would need to know that this had happened, so they could do the build manually.

Target users

People who make local "mass-rebuild" changes.

curiousleo commented 4 years ago

Chatted with @grahamc. He has ideas on this and says he'll respond here as soon as he gets around to it.

grahamc commented 4 years ago

I think the goal of "don't do huge amounts of builds locally" makes sense and is a good idea!

My thinking though is that there really isn't a number where Lorri should cut-off. Considering an extreme case: one of your dependencies takes 18 hours to compile and then another 100 which take 0.1s to build. There is no number which would make sense as a limit.

A more complicated solution might be "if you have to build a derivation whose name matches "xxxxx", just use the cached copy. This would be somewhat better, but quite complicated and the name doesn't really catch up.

I think a nicer solution which I would like to see adopted in to lorri would be a way to tell lorri to pause/resume builds for a project.

Imagining what happens now is:

$ ./update-my-deps.sh

$

  ~~~a million hours pass while your laptop builds~

direnv: Reloading environments
$

this would instead enable:

$ ./update-my-deps.sh; lorri pause

$

I think an important part of this feature is some UX like:

$ cd my-project
WARNING: this project's evaluation has been paused since 2020-02-05 12:15 EST.

direnv: loading environment ...

$

Once CI has built the new stuff, lorri resume or lorri unpause or something.

Note: the pause information should probably not be persisted to disk, and instead be an in-memory hashtable of project -> pausestatus.

michaelpj commented 4 years ago

Pause/resume would help. At the moment I just stop and start the service, but that stops it for all projects and it can get restarted by the socket. So definitely an improvement.

Perhaps lorri could auto-pause if a rebuild went on for too long, for some suitable definition of "too long"? And export a LORRI_PAUSED=true variable so people could notice?

Profpatsch commented 4 years ago

Pause/resume would help. At the moment I just stop and start the service, but that stops it for all projects and it can get restarted by the socket. So definitely an improvement.

If we have to add another boolean flag (and thus more states to the internal state machine) to support this use-case, we are just working around an architectural misdesign. The question we should be asking here is: why do we have to stop all other builds as well, and can we change the lorri daemon to not be all-or-nothing.

schmittlauch commented 4 years ago

I noticed another case when mass rebuilds happen: As soon as I update my nix channels, the lorri daemon immediately triggers the simultaneous rebuild of all my environments that use the system's/ user's nixpkgs instead of a pinned one. This makes my system unresponsive for a while.

nyarly commented 4 years ago

@Profpatsch Is your thought something like pushing build loops to a per-project level? In #96 I was just reading the question about "why can't lorri self-start a daemon?" I wonder if there could be a per-user self-start daemon whose only responsibility was maintaining a directory of per-project build sockets. Toggling build requests would become a matter of direnv edit to comment out use lorri. Perhaps a lorri stop command, to kill running builds?

peterhoeg commented 4 years ago

My workaround (which is heavy-handed admittedly), is to stick the following into the unit file:

[Unit]
ConditionPathExists=!%h/.cache/power_save

I use that as a toggle for a few other services as I just need to touch that file in order for lorri and others not to run.

michaelpj commented 4 years ago

The question we should be asking here is: why do we have to stop all other builds as well, and can we change the lorri daemon to not be all-or-nothing.

My problem was actually quite insidious and not to do with lorri hogging resources: I had an issue with the determinism of my build (it accidentally depended on the git directory), and it was doing too much work, so lorri rebuilt furiously, and I actually ran out of disk space.

I'm not sure how you improve the architecture to avoid that use case.

This makes my system unresponsive for a while.

Sounds like you might want to set nice/ionice on the nix daemon (NixOS has options for this, I recommend it).

I noticed another case when mass rebuilds happen: As soon as I update my nix channels, the lorri daemon immediately triggers the simultaneous rebuild of all my environments that use the system's/ user's nixpkgs instead of a pinned one.

Perhaps another useful feature here would be to not start jobs for projects that haven't been queried in some amount of time? That way only things you're actively working on will get rebuilt. The cost of course is latency the first time you start working on a project, but I'd be okay with that.

Profpatsch commented 4 years ago

Perhaps another useful feature here would be to not start jobs for projects that haven't been queried in some amount of time?

There’s an open issue for adding a timeout to what projects are watched: https://github.com/target/lorri/issues/163

That would be a very useful feature to implement if somebody feels up to the task.

Profpatsch commented 4 years ago

Is your thought something like pushing build loops to a per-project level?

I would indeed like to clean up the architecture a little bit and make less things depend on a singleton daemon running. It’s usually a bad design if a tool enforces only one instance.

I’d pull out the watcher to be an independent thread with some defined internal state:

struct Watcher {
  BidirectionalMap<NixFile, Set<WatchedPath>>
}

with some channels to watch a new file, since the watcher is the only part that is really centralized because of the resource limit (to keep the set of watched paths small).

Then we can loosen what has to happen in lockstep and make builders more lightweight (and also have more than one build loop).

target / lorri

Maybe don't build if there's too much to build? #319