jupyter-incubator / sparkmagic

Jupyter magics and kernels for working with remote Spark clusters
Other
1.33k stars 447 forks source link

SparkMagic does not correctly set cell error type #518

Closed mbrio closed 5 years ago

mbrio commented 5 years ago

I've been working with Papermill and Sparkmagic, and have noticed that whenever a Spark error occurs while using Sparkmagic in a Jupyter notebook, Papermill does not throw an error which it should. I had posted this error on the Papermill github account and they responded that it may be the kernel not following correct protocols. I dug a little deeper today and found that when a Spark error occurs, the 'output_type' is stream and the name is stderr, as opposed to 'output_type' being error, which I believe is the correct type the cell should be marked as. Do you know if there are guidelines for errors on the IPython codebase? I would like to try and solve this problem, either by helping fix the inconsistency in sparkmagic, or papermill.

mbrio commented 5 years ago

I realize, that this is probably because we wouldn't want all spark errors to be fatal errors when running Jupyter as a standard development notebook, but there are situations where running a notebook should cause a fatal error if a spark cell didn't execute correctly. Possibly utilizing the configuration class that exists in SparkMagic to ensure that all errors are treated as fatal could be a good solution, I'm looking into implementing this now.

MSeal commented 5 years ago

For a little more context: https://github.com/nteract/papermill/issues/330

I can help with kernel messaging questions / guidance if it's needed.

mbrio commented 5 years ago

At this point in time, I've engineered a fix that I do not know is correctly implemented or not. I had noticed that for a particular set of errors, the SparkMagic kernel raised an exception, but for Spark statement responses they never raised exceptions.

In my implementation (which I'm about to put a pull request for) I setup two configuration options, one that states that an exception should be raised (it is defaulted to should not since that is the current implementation) and the other that states whether the Livy session should be shutdown if a Spark statement fails.

@MSeal, I think the way in which SparkMagic is handling this is purposefully done, but only speaks to how the scripts should be run in a development environment, not as a production script run through Papermill.

I'm unsure if I agree with it being done this way, as even in a development environment I would want a hard stop if an exception occurred.

In any case, when my configuration is set to True the exception is correctly raised, an error cell is output, and all downstream cells halt execution.

mbrio commented 5 years ago

@MSeal, here is my pull request with unit tests, if you want to see what I did, I'm awaiting a response from the sparkmagic team to see if there is an alternative way of handling this that jives better with their source code: https://github.com/jupyter-incubator/sparkmagic/pull/519

itamarst commented 5 years ago

Fixed, should be in next release.