memsql / singlestore-spark-connector

A connector for SingleStore and Spark
Apache License 2.0
160 stars 54 forks source link

PipelineMonitor graceful stop #20

Closed 0x0ece closed 4 years ago

0x0ece commented 8 years ago

Hi,

With the new support for checkpoints it would be nice to have:

  1. a graceful stop of the pipelines
  2. a graceful stop/restart of the interface (In short to avoid duplicated messages when restarting.)

Step 2. seems more complex, it should take into account timeouts (if a pipeline doesn't stop) and personally I'd also need some sort of priority to stop/restart the pipelines in a particular order. So, I'm planning to manage that with an external controller.

On step 1, instead, I was looking at the code and it seems relatively doable. Just to be clear, with "graceful stop" I mean that if the step started, then it has to complete.

If I'm not wrong, this is the cause of the immediate stop: https://github.com/memsql/memsql-spark-connector/blob/master/interface/src/main/scala/com/memsql/spark/interface/PipelineMonitor.scala#L639 and in principle PipelineMonitor.stop() could just set isStopping=true, while cancelling the job and interrupting the thread could be moved after the step terminates here: https://github.com/memsql/memsql-spark-connector/blob/master/interface/src/main/scala/com/memsql/spark/interface/PipelineMonitor.scala#L179

Ritual questions:

Thanks, E.

choochootrain commented 8 years ago

this makes sense to me - we would want to keep the current behavior in some form so you can stop hanging or long running batches. would you have time to tackle this? my hands are tied at the moment with other projects :)

0x0ece commented 8 years ago

Sounds good, I'll have a look at it. Please just confirm me the overall design.

  1. I'll keep the current code as forceStop
  2. I'll write a new gracefulStop
  3. stop will be if (usesCheckpointing) { gracefulStop } else { forceStop } (or do you prefer always gracefulStop?) -- the idea here is that if you want at-least-once you want a graceful stop, if you want at-most-once might as well terminate asap. 3b. this way, from api or wherever you can also let people chose, but I won't touch that part
  4. I'll try to set a timeout, I was thinking a multiple of the batch should work. Ideally your actual batch duration should be less than the configured batch duration, so maybe a 2x timeout is a good approximation. Thoughts? 4a. I've no actual idea on how to set a timeout, maybe the equivalent of a setTimeout(forceStop, timeout) in js (that I'll figure out how to write in scala)? Or do you think something more sophisticated is needed?
  5. Optional, based on how long it's going to take me, I could optimize and forceStop if the transformation phase has not started yet. (this assumes that the extractor has no side effects though, but for long extractions this might be good to have.)

Thoughts? Other things?

choochootrain commented 8 years ago

how will you be gracefully stopping the pipelines? if you will be communicating with the streamliner api directly, it might be worth adding an optional forceStop param that defaults to true - that way MemSQL Ops continues to work normally but you have a knob that you can control in your own scripts.

1, 2. perfect.

  1. see above
  2. that seems reasonable, although 2x the batch interval could be very long if the interval is 1 day and the batches are supposed to be small.
  3. this will be tricky, i wouldn't bother unless you really have the resources to do it :)
0x0ece commented 8 years ago

For 3. I have to interact with Ops, because if I talk directly to the interface to stop pipelines then Ops will restart them. So, this is something you'll have to do :) I then have to query the interface to see if there are still threads running, because Ops only reports the status of the pipelines, which becomes offline almost immediately.

For 4, it seems harder than expected. I wish we had js... I've found several ways well summarized here: https://github.com/semberal/semberal.github.io/blob/master/scala-future-timeout-patterns.md but all require essentially to turn stepPipeline into a future.

(Tonight I'll prob stop by around dinner time, if you have 2 min we can chat about it and I can show you something.)

carlsverre commented 4 years ago

No longer an issue - PipelineMonitor has been removed from the product for at least 3 years.