mwouts / jupytext

Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts
https://jupytext.readthedocs.io
MIT License
6.65k stars 386 forks source link

WIP Support asynchronous contents managers #1021

Open mwouts opened 1 year ago

mwouts commented 1 year ago

May fix #1020

mwouts commented 1 year ago

@blink1073 thanks for your advices at https://github.com/jupyter-server/jupyter_server/issues/1127

At this point (https://github.com/mwouts/jupytext/pull/1021/commits/c5bdeed80646b96743bcedf0a8450be951920736) I have been able to make the test pass on the contents manager (and, as you suggested, the tests are setup to test both the CM derived from the LargeFileManager and the CM derived from the AsyncLargeFileManager).

Now the problem I am facing is that I don't see how to contain the expansion of the async functions... Take for instance this function write_pair, which takes as a argument a function that writes one file. In the context manager the function write_one_file is async, so write_pair becomes async as well. Is there any way to avoid this? (*) Does it mean that every function in jupytext that ever read or writes a paired notebook will have to become async? And any function depending on them? :smile:

(*): I tried this: loop = asyncio.get_event_loop(); oop.run_until_complete(...) but it complained about the loop being already running

image

mwouts commented 1 year ago

In the end I've decided to write two functions: write_pair and write_pair_async in this commit - this seems to work well to limit the expansion of async functions.

mwouts commented 1 year ago

An experimental version with support for async contents manager is available with:

BUILD_JUPYTERLAB_EXTENSION=1 pip install git+https://github.com/mwouts/jupytext.git@support_async_contentsmanagers

We will need to test it in various environments before releasing that (and also I'd be curious to identify the limits, i.e. what happens if we test it with an older version of Jupyter, etc)

mwouts commented 1 year ago

I see at least two issues on the CI:

  1. Not all versions of jupyter_core have ensure_async
  2. I've had to add two new methods like rename or trust_notebook to the contents manager (as otherwise their base implementation was either not async, or not awaiting for the async calls), and at least one of them fail:
    
    >       await cm.rename(tmp_nbpy, "new.nb.py")

tests/test_contentsmanager.py:278:


self = <jupytext.contentsmanager.build_jupytext_contents_manager_class..JupytextContentsManager object at 0x7fd97f57ba60> old_path = 'notebook.nb.py', new_path = 'new.nb.py'

async def rename(self, old_path, new_path):
    """NB: This method was added for the async port"""
    await ensure_async(self.rename_file(old_path, new_path))
    await ensure_async(
        self.checkpoints.rename_all_checkpoints(old_path, new_path)
    )
  self.emit(

data={"action": "rename", "path": new_path, "source_path": old_path} ) E AttributeError: 'JupytextContentsManager' object has no attribute 'emit'

Advice is welcome!

blink1073 commented 1 year ago
mwouts commented 1 year ago

Thanks @blink1073 ! Oh yes I see then requiring jupyter_core as you stated, and jupyter_server>=2.0 should fix the two issues above.

Re the necessity to explicitly add the async def rename method in Jupytext's cm, I was referring to the sync implementation

def rename(self, old_path, new_path):
        """Rename a file and any checkpoints associated with that file."""
        self.rename_file(old_path, new_path)
        self.checkpoints.rename_all_checkpoints(old_path, new_path)
        self.emit(data={"action": "rename", "path": new_path, "source_path": old_path})

which is not compatible with an async rename_file in my derived class, hence the requirement to add that method - that's not really an issue, my only concern is that by doing so I duplicate some code and take the risk that it does not evolve like the base one.

blink1073 commented 1 year ago

You can use run_sync from jupyter_core to convert an async function to a sync one, effectively.

blink1073 commented 1 year ago

Ideally though, the Jupytext class would be async as well.

LoicGrobol commented 12 months ago

@mwouts Is there anything that would help you with this?

mwouts commented 12 months ago

Thank you @LoicGrobol for offering your help! Well this was a while ago, so I will need to look into it to answer more precisely.

I think I wanted to identify what version of Jupyter would be required if we were to make Jupytext's content manager fully async.

Also I was seeing a huge impact on the tests, probably because I was/am not aware of the best way to test async method, so I could use a few pointers there.

Also we will have to redo that PR from scratch, but maybe only when the test reorganization is over (one of the currently opened PR).

LoicGrobol commented 12 months ago

Ok I'll have a look into these points then :)

mwouts commented 11 months ago

Hi @LoicGrobol , I have done a quick rebase of this branch, so that in the future we can experiment with it.