mozilla / jupyter-spark

Jupyter Notebook extension for Apache Spark integration
Mozilla Public License 2.0
193 stars 34 forks source link

Progress bar #11

Closed shaybeau731 closed 8 years ago

shaybeau731 commented 8 years ago

There's still some iffy behaviour happening, but I thought it'd be good for people to look this over and give any feedback they may have.

How it works so far:

What is iffy:

mreid-moz commented 8 years ago

I've tried this out - it's really cool.

The first Spark cell that is executed can take a while to load, as the SparkContext gets up and running; since no jobs are being run at this point, I'm not sure how to monitor its progress (or if it needs monitoring at all)

We could keep a variable that gets initialized the first time we successfully hit the Spark API.

I upped the UPDATE_FREQUENCY of the cache so that progress bars can update more regularly... but this makes things very inefficient :/

I think we can leave the update freq at 5 or 10 seconds if we add code to also update the cache whenever a cell starts executing - that way we'd get the latest data right when we need it, but wouldn't need to hammer the API all the time (although once per second may not be too bad anyways).

Non-Spark cells ideally shouldn't have a progress bar... (idea just now - initially hide progress bar?)

+1 for initially hiding the progress bar, then show & update if and only if we detect a running Spark Job

Each SparkContext runs on a different port, so I'm not sure if you can have multiple Spark notebooks open at the same time (as we are only monitoring the 1st port)

I'm not too worried about this (at least not yet).

A couple of other things. I noticed that the progress bar doesn't always reflect an active job. I think it's just grabbing the first job from the cache, which is often the previous completed job. Forcing a cache refresh when a cell starts executing as described above may fix this.

If you queue up a few cells to run, the progress bar sometimes shows up in the wrong place. I think we need to track which cells are pending vs. actively executing and maybe have another state for the progress bar for "waiting to run" rather than "loading spark". Any currently running spark jobs should affect the currently executing code cell's progress bar, and there should only be one executing cell at a time (I think that's how it works, at least).

A single cell can run multiple jobs too (think a for loop or just multiple operations in a row), and it seems like it would be overly complicated to try to add them all up and maintain "overall" progress across an arbitrary number jobs. I think the progress bar should reflect the currently-running job (and just reset when a new job starts). If we could include a counter of "completed jobs for this cell", that would be a bonus.

shaybeau731 commented 8 years ago

I made a few of the changes you suggested, and now things are running much smoother!

Please let me know what you think!

frol commented 8 years ago

I have also tried out your patchset. Works fine enough for development, but some glitches occur here and there:

Here is a video-recording of the described glitches: https://youtu.be/H6-Ekrl66wM

Thank you!

mreid-moz commented 8 years ago

@musicaljelly @yeah568 @agjwong Please also have a look. Thanks!

mreid-moz commented 8 years ago

@frol thanks for the feedback!

musicaljelly commented 8 years ago

Runs fine on my machine, and the code looks good to me!

musicaljelly commented 8 years ago

Should we bump the version number?

mreid-moz commented 8 years ago

I'm still seeing the progress bar attaching to unexpected cells - I'll dig into it more soon.

yeah568 commented 8 years ago

Apart from the issues already mentioned, seems to be working fine here.

shaybeau731 commented 8 years ago

Thanks for the feedback! And thank you @frol for taking the time to make a video, it was a big help! I made some changes in how progress bars get removed, and that bug should be fixed now!

mreid-moz commented 8 years ago

I just tried it again, and it looks like things work great! Thanks!

mreid-moz commented 8 years ago

Ok, I added a few style / whitespace nitpicks, but otherwise this looks great!

mreid-moz commented 8 years ago

Resolves #3

frol commented 8 years ago

Thank you guys for your work! I was dreaming of this integration since I run my first code on Spark...