jupyter / nbconvert

Jupyter Notebook Conversion
https://nbconvert.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1.73k stars 565 forks source link

Process Messages of new type on 'iopub' channel #1096

Open zoebraiyan opened 5 years ago

zoebraiyan commented 5 years ago

We want to process new types of messages to be received on 'iopub' channel. What is the best way to implement this? Looking at versions 5.2.1 and 5.6.0. Right now if the messages type is not known to nbvonvert the exception is caught as ValueError.

Looking at run_cell method is ExecutePreprocessor. 5.2.1 uses output_from_msg and 5.6.0 is using process_message.

MSeal commented 5 years ago

What new message types are you trying to support? I'm curious about the use-case and problem being solved and this might inform how the tool might be expanded.

Today there's not an easy way to change the behavior beyond monkey patching the appropriate functions (not highly recommended either but sometimes what's needed). Basically nbconvert leans on jupyter_client which is trying to adhere to the spec as best it can, though in some ways it has become the implementation that defines the spec. And the spec is somewhat rigid in approved message types.

What some tools do is to hack the comms message and provide content['data'] with whatever custom info you want. Widgets does this in lieu of having a new message type in the spec and to be a bit more backwards compatible. It's debatable if this is the right pattern for new protocol actions, but it might be a way to approach your problem.

zoebraiyan commented 5 years ago

We have a back-end executor program that wraps ipython kernel. We are trying to send some resource usage messages info to the client application.

MSeal commented 5 years ago

So the comms solution probably fits best here: https://jupyter-client.readthedocs.io/en/stable/messaging.html#custom-messages, which should pass through nbconvert without issue (let me know if you do see any trouble therein).

zoebraiyan commented 5 years ago

Thank you for your suggestion. Just to go in further detail here is what we are trying. Send a custom message(resource usage details) after each code cell is executed. All messages received over 'iopub' are processed in the ExecuteProcessor in nbconvert. Where and how should the custom message be processed over 'iopub' channel ?

MSeal commented 5 years ago

Given your producing custom comms messages you can create a subclass which handles the messages. (all the code below I haven't wrote here without testing, so it might have minor issues): Make a file customcommsrocessor.py

class CustomCommsExecutePreprocessor(ExecutePreprocessor):
  def handle_comm_msg(self, outs, msg, cell_index):
    super().handle_comm_msg(outs, msg, cell_index)
    # Your custom message handling code.

Then you can either refer to it in a commandline

jupyter nbconvert --Exporter.preprocessors=\["customcommsrocessor.CustomCommsExecutePreprocessor"\] CustomCommsExecutePreprocessor.enabled = True ...

or add a custom config file with the class path exposed to include the new processor:

c = get_config()
c.Exporter.preprocessors = ['customcommsrocessor.CustomCommsExecutePreprocessor']
c.CustomCommsExecutePreprocessor.enabled = True

and use this config via:

jupyter nbconvert --config customconfig.py ...
zoebraiyan commented 5 years ago

Thank You for your help @MSeal

zoebraiyan commented 5 years ago

@MSeal We were using nbconvert 5.2.1 till now and are in the process to update to 5.6.0. With the new flow for run_cell method in ExecutePreprocessor the timeout to read from shell channel has reduced from 30 seconds to 1 second and is not configurable. This is causing time out issues. Is there a plan to make the time out configurable in a future release. Are there any other alternative solutions for this problem. Below are the lines where the timeout happens.

timeout = self._timeout_with_deadline(1, deadline) exec_reply = self._poll_for_reply(parent_msg_id, cell, timeout)

def _poll_for_reply(self, msg_id, cell=None, timeout=None): try:

check with timeout if kernel is still alive

        msg = self.kc.shell_channel.get_msg(timeout=timeout)
        if msg['parent_header'].get('msg_id') == msg_id:
            return msg
    except Empty:
        # received no message, check if kernel is still alive
        self._check_alive()
        # kernel still alive, wait for a message