ndmitchell / rattle

Forward build system with speculation and caching
Other
102 stars 5 forks source link

Speculate after a hazard #4

Open spall opened 5 years ago

spall commented 5 years ago

I think there are some nuances I will be leaving out, but in general i think it should be possible to speculate after a hazard has been encountered. We should be able to save the tracing data that allowed Rattle to detect the hazard, thus allowing rattle to schedule cmds in such a way that the same hazard shouldn't occur during the next attempt.

Implementation wise, for this to work I think hazards will need to be categorized as unrecoverable (hazards that violate the consistency property) and recoverable (hazards caused by running commands in wrong order/parallel). Unrecoverable hazards should be identified before Rattle attempts to re-run a build since currently they are identified via the re-run being non-speculative. This should be easy to detect if there is a strict ordering on commands (for detecting non-recoverable read/write hazards).

I think there are some additional nuances with regards to pathological commands that never have consistent inputs/outputs (not sure of an example), thus potentially causing Rattle to always fail to finish a build. It probably makes sense to have a set number of speculative attempts before falling back to sequential. And certainly there are probably other possibilities, such as putting barriers around whichever command caused a hazard; creating a sequential bottleneck.

These are some of the initial thoughts @samth and I had.

ndmitchell commented 5 years ago

Cool, yep, I think you're right. I am fairly certain you can have a pathological example that means that any amount of speculation will always lead to it never completing. My approach of giving up after one attempt is a way of avoiding that, but anything finite should work. I'd really rather something more principled though - what does the pathological case look like? How can you avoid it in a principled way?

With regards to a particular Hazard, I think merely the fact we encountered the hazard this time around will have provided enough information to avoid that hazard next time around. I originally thought about using a continuation monad in Run, so you could restart "skipping" the prefix of the build - but later convinced myself that restarting from the beginning isn't a problem, since that's meant to be fast anyway.

spall commented 5 years ago

I think in the worst case a pathological command has a hazard with a new cmd each time the build is re-run. But, because speculation decisions are made using all-previous tracing data; the scheduler shouldn't repeat mistakes and should eventually be required to execute the build correctly. This is my instinct at least. I think the goal will just be to make "eventually" be as soon as possible

I'm not sure what a principled approach would be to detecting pathological commands. I will think about that.

With regards to a particular Hazard, I think merely the fact we encountered the hazard this time around will have provided enough information to avoid that hazard next time around.

I agree, but I think for non-recoverable hazards there is no point in re-running the build again; so we should identify those asap and tell the user.

ndmitchell commented 5 years ago

but I think for non-recoverable hazards there is no point in re-running the build again

Agreed. I originally had the idea that speculative hazards and real hazards were totally separate, and that for instance a read hazard on a speculative cmd could just be ignored. However, I quickly realised I was running faster than my code, so simplified. I'm sure there's a robust theory here which would influence exactly what needs doing.