ndmitchell / rattle

Forward build system with speculation and caching
Other
102 stars 5 forks source link

Hazards and speculation #13

Open spall opened 5 years ago

spall commented 5 years ago
** r-cmd read r-cmd write s-cmd read s-cmd write 2nd
r-cmd read OK See (3) OK Restart
r-cmd write See (3) NR OK Restart
s-cmd read OK Recoverable OK See (1)
s-cmd write Restart Restart See (2) Restart
1st

Notes: r-cmd means the command is required. s-cmd means the command was speculated; so we haven't encountered it in required list yet (if it is a member). Cmds in first column occured earlier in time, and cmds in first row occured 2nd.

Restart: delete all speculation info and start over.

(1): If the speculated write was listed BEFORE the speculated read then it is Recoverable; but if the speculated write was listed AFTER the speculated read then it is Restartable. (2): If the speculated write was listed BEFORE the speculated read then it is OK; but if the speculated write was listed AFTER the speculated read then it is Restartable. (3): If the required read was listed BEFORE the required write then it is NonRecoverable; otherwise it is recoverable.

I have highlighted the cased I don't think rattle currently deals with. I think we can fix this by specifying whether WriteWrite hazards and ReadWrite hazards are non-recoverable or restartable;

EDIT: Look at comment below for more correct version.

ndmitchell commented 4 years ago

This is really nice work! Sorry it's taken so long to get to it. Turning it into a functional program taking two events in the order they occur, and assuming the events are all in a sequence with no overlapping ranges (unlike reality), we get the logic:

f AR AR = OK
f AR AW = Fatal
f AR SR = OK
f AR SW = Restart
f AW AR = OK
f AW AW = Fatal
f AW SR = OK
f AW SW = Restart
f SR AR = OK
f SR AW = Restart
f SR SR = OK
f SR SW = Restart
f SW AR = OK
f SW AW = Restart
f SW SR = OK
f SW SW = Restart

I used SW = Speculative Write, AR = Actual Read (I used Actual rather than Real so that the 2 letter abbreviation is still unique). I'm still pondering over whether Recoverable is a real thing - the program has written files to disk, that have potentially been used in other dependency chains. Recoverable has to mean doing something with those two? What would you describe the intuition as?

We can compress the above program to:

f __ _R = OK
f A_ AW = Fatal
f __ SW = Restart
f S_ AW = Restart

These clauses are all disjoint, so we can reason about a description of what is allowed.

Thoughts?

ndmitchell commented 4 years ago

And maybe with your more refined classification there is no read/write write/write hazards, they are one of the pairs above, but the more interesting thing is the restart/ok/fatal side.

spall commented 4 years ago

I agree that with Rattle's current scheduling policy there is no distinction between a "recover" and a "restart", but I believe that there is a distinction between the two situations and other policies can take advantage of this.

To explain my intuition I will first state what I believe are the facts of Rattle, and then I'll explain why I think recovering is valid.

Let us assume that the Rattle build I am describing has no property violations, aka no nonrecoverable hazards, and it has been run before so Rattle has speculation information.

facts:

  1. If we execute the required cmds in order the build will "complete" and there will be no hazards.
  2. For a cmd to be added to the required list; (1) it must be listed by the user and (2) all cmds listed before it must have completed already.

So, it follows from the above, and what I defined as a recoverable hazard, that for a Recoverable hazard to have occured, Rattle must have speculated a cmd which performed a read too early. If the cmd had been required, then we we would have a non-recoverable hazard.

I claim that if we were to execute that cmd again immediately, we would not encounter the same hazard again, because the write it relied on is guaranteed to have completed. Of course depending on how Rattle might decide to re-execute this cmd, a different recoverable hazard could be encountered.

If the cmd which we are re-executing is writing files that other cmds previously read, then Rattle would detect these new hazards, and the need to re-execute would propagate to these cmds as well.

Of course whether or non recovering or restarting is better will depend on the program.

spall commented 4 years ago

I wrote a new scheduling policy for Rattle, which takes advantage of the concept of recoverable hazards to avoid halting all threads and restarting when a recoverable hazard has been detected.

https://github.com/spall/rattle/tree/conservative

ndmitchell commented 4 years ago

Cool. I think I'm now convinced you are right. Good explanation!

spall commented 4 years ago
** r-cmd read r-cmd write s-cmd read s-cmd write 2nd
r-cmd read OK See (3) OK Restart
r-cmd write OK NR OK Restart
s-cmd read OK Recoverable OK See (1)
s-cmd write Restart Restart See (2) Restart
1st

Notes: r-cmd means the command is required. s-cmd means the command was speculated; so we haven't encountered it in required list yet (if it is a member). Cmds in first column occured earlier in time, and cmds in first row occured 2nd.

Restart: delete all speculation info and start over.

(1): Just Recovering is fine.
(2): OK with the note that there could be a property violation in script still. (3): If the required read was listed BEFORE the required write then it is NonRecoverable; otherwise it is recoverable.

I have highlighted the cased I don't think rattle currently deals with. I think we can fix this by specifying whether WriteWrite hazards and ReadWrite hazards are non-recoverable or restartable;

Corrected this based on @ndmitchell's #18