wonks / Talk_Rehearsal_Feedback

1 stars 0 forks source link

2018 PLDI - Mike Rainey on Heartbeat scheduling #7

Closed rrnewton closed 5 years ago

rrnewton commented 6 years ago

Leave feedback below. Please include slide title or number.

ccshan commented 6 years ago

List institutions on title slide

rrnewton commented 6 years ago

My notes from the talk

slide 3 - bullets with too much text, and a lot of narrative for one slide... probably could be a larger number of slides that shows these ideas more sequentially... (even if there's a persistent graphic that's built up, mapping the landscape with the 2 existing approaches and the new proposed approach)

slide 4 (4:00) - "fork join" missing hyphen which I think was on other occurrences

sld 5 - dunno if I like omitted semicolons here, at least in combination with the indentation of the second map call

sld 6 6:45, -

sld 7 - really like the visualization on this slide, esp. the color coding

hard to visually see what's going on in the third category though... takes a second to see/guess what's happening on at the second blue circle where the delayed parallelism occurs... maybe highlight that node... i.e. grow it in an animation.

Heck, why not animate these things as a whole? Make them appear as time proceeds.

10:30 - sld 8 -

14:50 - sld 10 -

This is good background... but it's getting pretty late in the presentation to INTRODUCE the key idea (heartbeat scheduling)... risks burying the lede.

16:20 - sld 11 - here's where we get the key idea.

19:20 - sld 13 -

sld 14 -

24:18

sld 15, - I like the nice clear "Lower is better" tag.

27:10

27:33 - sld 16 - I really like the combination of related work and conclusion. Huh "fork join" without hyphen again.

Finish time: 29:50..

My conference questions

Other questions:

Ken: what is tricky about the heartbeat implementation?

the alarms, polling... wish it were easier to change calling conventions...

Sam: could this be applied to systems where thread creation has some semantic guarantees? Like Go, where it provides a concurrency guarantee.... can we optimize thread creation in such a setting?

(dodge) middle ground, futures... that before full threads. Still a challenge to show bounds.

Chai: how does one select the heartbeat interval for different systems?

we have a microbenchmark program... binary tree of threads... start with small H bump it up till we get to 5% overhead

Jeremy: seems like there's a tension... the bigger you make the H the bigger the span gets. But the smaller the H, the worse total work gets. Could there be no good setting?

suggests PDF as alternative to work stealing.... want sweet spot where your work overhead is <=5%... but if you're pruning away too much parallelism your scheduler may not fit with this approach fairly well. So far we have only studied it with work stealing. PDF is promising because it HAS a granularity problem...

[DID NOT DEFINE PDF - parallel depth first scheduling]

(I ask about units of H -- cycles... )

Sam points out that "cycles^2" is not a sensible unit... concerned about Hs term...

Turns out there is a missing divide-by-tau... no, more than that... the entire "H" in the last term of the end-to-end bound, needs to be replaced by (1+T/tau)s

Seems like you could express this pretty clearly in terms of wanting to minimize BOTH H/tau and tau/H.

Round of feedback

Ken: thinks slide 13 needs animation...

Chai: 1st time we heard about heartbeat was late.

he was telling a story...

Bo: two slides with numbers he doesn't know where it comes from. (55X slowdown...)

Bo and Jeremy were both confused about what "these map things were" inside the stack. "Where did bar come from."

On slide 12 Jeremy thinks the code for map would be more valuable than the cartoon picture of the chip...

Ian: would rather see more slides like 12 with pics rather than too much text... argues for revealed bullets...

Artur argued for more text!? That doesn't seem to be the popular contemporary stance.

Sam: thinks intro needs to change pretty substantially... eliminate all bullets and text... motivation is aimed somewhere in the middle of the fork-join experts and the plebians.

Sam: wants to emphasize the RUNTIME decision of whether or not to create a thread at the point of spawn. "No semantics to spawn". "Goal: nested parallelism that is really lightweight" "Start with nested parallelism based on some system like pthreads... This is a bad idea... Could switch to Go/Haskell threads. Get better. Not good enough. Could do Cilk/Java-forkjoin. Still not good enough."

(Ryan: I think you should never use the term thread... don't trigger that confusion more than it needs to be...)

(Ryan: I like those progressively-less-dumb examples... I would ESPECIALLY like a little COUNTER on the side that indicates the approx Cycles/Spawn... and you see how it goes down and down.)

Ken: this talk made a lot of sense to me... even though I thought I didn't know anything about parallelism. Heard enough about cilk peripherally.

Ken: suggestion for how to get to heartbeat scheduling ... start with map program, sec 5. Mike: can you then think of a better example than map?

Ken is happy with starting with the "declarative version" and not start with the pthread strawman.

Ken: take the "black" on slide 6 (graph) and copy to prev slide.

ccshan commented 6 years ago

Make sure your slides contain text for your main points (for example, your first slide did not contain terms like "declarative" and "heartbeat"). But the total number of bullets in your entire slide deck should be at most 5.

ccshan commented 6 years ago

End your talk with a conclusion slide and speaking the magic phrase "thank you".

ccshan commented 6 years ago

Related work can be addressed by showing citations in slides throughout the talk and never speaking about them and never having a related work slide.

"This class of approaches" -- what class?

ccshan commented 6 years ago

Slide 14 has unclear reading order.

ccshan commented 6 years ago

Analytical bounds have too much notation (and what's "scheduling overhead cost"? what's even its unit?) so visualize and simplify, or omit.

ccshan commented 6 years ago

In map example, use map(0,4) or map(0,8) to avoid my worrying about how the midpoint is computed. Also, clarify whether your stacks grow up or down.

ccshan commented 6 years ago

Speech advice: Write out a script, but don't read from it, but memorize your first minute. Videotape yourself.

ccshan commented 6 years ago

Your discussions of global vs distributed work queue and of work stealing seem orthogonal to the point of your talk and your contribution and so should be abbreviated as much as possible.

ccshan commented 6 years ago

Concisely state where the trickiness in your implementation lies.

ccshan commented 6 years ago

Your native support for parallel loops should be a mere side remark in this talk, and does not distract from the value of the running example "map".