mozilla / jupyter-spark

Jupyter Notebook extension for Apache Spark integration
Mozilla Public License 2.0
193 stars 34 forks source link

Add JupyterLab support #41

Open mdboom opened 6 years ago

mdboom commented 6 years ago

Fix #39.

This adds a Spark status window to the side pane in Jupyter Lab. I played with making a modal dialog like the old extension, but it feels like the side pane is more in keeping with the "JupyterLab way".

It looks like:

image

@teonbrooks, @jezdez

codecov-io commented 6 years ago

Codecov Report

Merging #41 into master will not change coverage. The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #41   +/-   ##
=======================================
  Coverage   96.61%   96.61%           
=======================================
  Files           3        3           
  Lines          59       59           
  Branches        5        5           
=======================================
  Hits           57       57           
  Misses          2        2

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 34ab4bf...430f619. Read the comment docs.

mdboom commented 6 years ago

We'll want to deploy this to npm somehow -- ideally using a Travis deploy clause like we already do for PyPI.

AbdealiLoKo commented 6 years ago

I came here looking for this exact feature - glad to see it already has a PR! A quick thought: Would be good to keep the progress bar as a second row as the sidebar is not meant for long horizontal outputs.

Current look:

Job ID1       Job Name      Progress
Job ID2       Job Name      Progress
Job ID3      Job Name      Progress

What I'm suggesting: Current look:

Job ID1       Job Name
Progress ============>
Job ID2       Job Name
Progress ============>
Job ID3      Job Name
Progress ============>

Is there also a progress bar at the cell that is running ?

mdboom commented 6 years ago

That's a good suggestion about the layout of the table. It shouldn't be too difficult to make that change.

There isn't a progress bar at the cell that is running (unlike in the "classic notebook" version of this plugin), because if I understand correctly, the API for that isn't released yet (http://jupyterlab.readthedocs.io/en/stable/developer/notebook.html#the-ipywidgets-third-party-extension). But I honestly didn't do too much digging about it, so if there's an obvious way forward on that that isn't likely to change too much down the road, I'm game.

elehcimd commented 5 years ago

Would be great to get this merged, any blockers?

jezdez commented 5 years ago

@elehcimd The problem is mostly making the table changes as @mdboom replied above and figure out a way how to deploy this to npm (and how that integrates with the production deployment of Jupyter).

@mdboom Would you mind elaborating how we'd apply this to our Spark instance? Would we have to install Node and run the full NPM build to get it to run in our lab environment?

mdboom commented 5 years ago

@jezdez: Yes, Jupyter lab extensions have to go through a build process which involves a local copy of node. Details here

AlJohri commented 5 years ago

gentle bump. would love to see some version of this integrated

IMAM9AIS commented 5 years ago

@jezdez @mdboom Thanks for this extension. We tried testing this extension on an enterprise level setup. From what i can understand the plugin tries to go through all the applications in the spark link, and maintains a cache for. It triggers the progress bar update if it sees a job in progress for any application id.

However, if we actually start considering this solution on an enterprise level, where we have millions of applications running in multiple queues, even a first iteration of update of all applications is taking a lot of time and incase a lot of latency.

Rather than creating a list of all application ids from the cluster, is there a workaround we can have in terms of the cache the plugin maintains ?

Is it possible that we can take the application ids of only the lab .ipynb files that are opened up, and maintain a cache for that, rather than having a cache of the entire array of ids.

Do let me know if we can do this. I happy to contribute, given some guidance .

birdsarah commented 4 years ago

I made a PR here that updates this PR to work with modern jupyter (esp latest tornado).