jupyter / nbconvert

Jupyter Notebook Conversion
https://nbconvert.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1.75k stars 569 forks source link

Default customized Preprocessor for a kernel #865

Open BoPeng opened 6 years ago

BoPeng commented 6 years ago

I have developed a Jupyter extension called SoS that allows the use of multiple kernels in a notebook. I have written a customized Preprocessor to execute the notebook so that cell meta information (kernel of each cell) can be passed to the sos kernel during batch execution but I cannot seem to find a way to "register" it somehow so that

jupyter nbconvert --execute notebook_with_sos_kernel.ipynb

can call this preprocessor by default. The kernel does define an exporter in language_info, but I am not sure if nbconvert uses it. My questions are

  1. Is it possible to register my preprocessor as a default preprocessor for option --execute for notebooks with sos kernel?
  2. If not, does it make sense for nbconvert to find such information from the nbconvert_exporter of language_info? Would a PR be welcome?

Many thanks in advance.

t-makaro commented 6 years ago

I believe that --execute is just a shortcut for --ExecutePreprocessor.enabled=True. See this line.

If I understand correctly, you want a way to override --execute to be a shortcut to --KernelSpecificExecutePreprocesser.enabled=True. No, there is no way of doing this currently since the flags are defined before the instance of the app is launched. We'd have to extract the meta-data from the notebook and update the flags accordingly before we process the flags (which may or may not be easy).

I'd be open to this feature, but I think it requires a much larger discussion about kernel specific settings in general. (I know that there has been talk of kernel specific templates). Automatically enabling preprocessors or changing flags would complicate the CLI since it opens the possibility for the flags to have different behaviour depending on the notebook provided. I think having kernels automatically enable/disable specific preprocessors might be useful, but that alone certainly wouldn't help you in this case.

TDLR:

  1. Not currently.
  2. I think it requires a larger discussion of kernel specific settings/preprocessors. (Seeing a possible implementation would help with the discussion)
t-makaro commented 6 years ago

I suppose #821 is also somewhat tangent to this.

t-makaro commented 6 years ago

If you're using a custom exporter, you may be able to check if the ExecutePreprocessor is enabled and then disable it and enable your own. It'd be hack, but it might get the --execute flag to work with the exporter. This would only fix part of the issue since if someone decided to convert to a different format, it wouldn't use your exporter.

I just noticed the thing about using the notebook_exporter meta-data. I'm not sure what that does, but in the one ipython notebook that I looked at it was set to "python". Which would export to a .py file, but nbconvert defaults to html. Again, it might be weird to change this based on meta-data, because the default could seem to arbitrarily change from notebook to notebook.

t-makaro commented 6 years ago

I guess this all boils down to: why doesn't your kernel execute properly with the ExecutePreprocessor?

Sorry for the overload of comments....

BoPeng commented 6 years ago

Because a sos notebook allows multiple kernels in a single notebook. The kernel information is saved as metadata for each code cell but currently metadata is not passed to kc.execute() so every cell is treated as Python cell by nbconvert.

Thank you for your answer to my questions so I suppose the only way to solve my problem is

jupyter nbconvert --to sos_notebook.converter.SoSExecuteExporter

or define such a class in entry point and use it as

jupyter nbconvert --to run_sos

This however means that I will need to define SoSExecuteExporter, and SoSHTMLExporter, in addition to the SoSExporter that I have already defined (which exports the notebook to a text-based workflow format). Is there any way to define a single exporter and control its behavior with parameters?

t-makaro commented 6 years ago

I'd recommend against a specific ExecuteExporter. Exporters should define the file format that the notebook is converted too. If you define your own SoSExecutePreprocessor you can use it with any exporter as:

Jupyter nbconvert --to <exporter or format> notebook.ipynb --SoSExecutePreprocessor.enabled=True

with the --inplace flag instead of to <exporter> if you want to overwrite the existing notebook with the output. notebook is a valid format to export to.

the following are equivalent

Jupyter nbconvert notebook.ipynb --execute
Jupyter nbconvert notebook.ipynb --ExecutePreprocessor.enabled=True

Since nbconvert uses traitlets, any settings that can be set in the config file can be set from the commandline. No exporter is really needed. You can see all of the config defaults here. All you need to do is hook into traitlets. Since the base preprocessor has this line, so long as you subclass the base preprocessor it will work (but defaults to False).

If you still want a custom exporter: Exporters are able to use these settings as well (so long are the exporter defines properties using traitlets) A custom exporter can enable specific preprocessors by default. As an example here's the latex exporter. The latex exporter grabs the preprocessor information from the config (config or CLI are effectively the same) with some defaults. You could add a additionally check there to see if ExecutePreprocessor.enabled=True and swap that for your custom SoSExecutePreprocessor.enabled=True this would effectively override any attempt at using the --execute flag to use your Preprocessor, but only for that exporter. (this way the --execute flag could work for your SoSExporter, and you don't have to make new pdf/html exporters, but the --execute flag won't work for pdf/html exporters, but you can use the long --SoSExecutePreprocessor for all exporters.)

So, I suppose this is a way for an exporter to override the --execute flag to point to a specific preprocessor just not for the kernel to override it.

TLDR:

  1. use a preprocessor to handle execution. (Maybe subclass our existing ExecutePreprocessor). If you can pass the cell-metadata to a kernel without breaking other kernels, then maybe a PR to add this to our existing ExecutePreprocessor is in order.
  2. Traitlets handles all the config.