Suggestion: adding custom content directly to emscriptens virtual filesystem in case of emscripten backed kernels

DerThorsten commented 6 months ago

For emscripten or to keep it even simpler, xeus-based kernels, we could add content to voici dashboards like we add packages to xeus-python-lite kernels.

We generate a tar.gz file of a certain folder at jupyter lite build - time (like we do for each python package in the case of xeus-python-lite). This tar.gz file and all other tar.gz files (from packages) are put into some static dir and will be fetched at kernel initialization time and are extracted to the emscripten virtual filesystem. That way, all files are accessible from the kernel without any service worker shenanigans.

martinRenou commented 6 months ago

Unfortunately, I am not sure it is that simple to do that while keep supporting the JupyterLite "classic" approach with the service worker.

The way it works today: The kernel mounts a custom emscripten filesystem that makes use of the service worker to get access to the files provided by JupyterLite (either browser local storage or custom "drive" in the sense of JupyterLab). This filesystem is mounted at the root.

So if we provide files packed in the virtual filesystem at build time, at which location should we put them?

We cannot put them at the same location as where the current custom emscripten filesystem is mounted, unless we update the jupyterlite's DriveFS implementation to both look into the service worker and the packed env. But that would mean making the xeus case a corner case in JupyterLite.

If we put them at a separate location, say "files/", users would need to update their code to load there files from there. Which is also not ideal:

with open("/files/myfile.txt", ...):
    ...

DerThorsten commented 6 months ago

Unfortunately, I am not sure it is that simple to do that while keep supporting the JupyterLite "classic" approach with the service worker.

The way it works today: The kernel mounts a custom emscripten filesystem that makes use of the service worker to get access to the files provided by JupyterLite (either browser local storage or custom "drive" in the sense of JupyterLab). This filesystem is mounted at the root.

So if we provide files packed in the virtual filesystem at build time, at which location should we put them?

We cannot put them at the same location as where the current custom emscripten filesystem is mounted, unless we update the jupyterlite's DriveFS implementation to both look into the service worker and the packed env. But that would mean making the xeus case a corner case in JupyterLite.

If we put them at a separate location, say "files/", users would need to update their code to load there files from there. Which is also not ideal:
with open("/files/myfile.txt", ...):
    ...

what I don't get is, the $PREFIX we use for all the packages is also just the root of the filesystem. Lets say a package has a file at $PREFIX/foo.txt it will end up at /foo.txt .

So now we also happen to have a file foo.txt in the jupyterlite drive, ie "in the left sidebar". What is the following supposed to do?

with open("/foo.txt", 'r'):
    ....

martinRenou commented 6 months ago

This filesystem is mounted at the root.

Sorry this is my mistake. It's not mounted at the root but at mountpoint here then we directly do an FS.chdir into that mount point so that the kernel runs from there.

martinRenou commented 6 months ago

So my point is that the location from where the kernel runs is the mount location where it will only use the DriveFS APIs to access and save files (using the service worker).

martinRenou commented 6 months ago

I would not have mounted this filesystem at the root but rather at some special location like /files [...] we could have set /files as the current working directory.

Yep this is exactly what we do (except the directory name probably), see my comments above. Sorry for the confusion.

DerThorsten commented 6 months ago

not sure if this is possible, but could we do it "the other way round" such that the jupyterlite filesystem is taking the content from the emscripten filesystem and not the other way around.

martinRenou commented 6 months ago

not sure if this is possible, but could we do it "the other way round" such that the jupyterlite filesystem is taking the content from the emscripten filesystem and not the other way around.

It would be possible to implement this for one kernel, but what happens when you have multiple kernels? Unfortunately I don't think this is possible...

jtpio commented 6 months ago

Thinking of a few cases where this could be confusing to users, if the files are only available in the virtual file system and not persisted anymore, for example when opening them with open('file.txt', 'w'):

if users have multiple Voici dashboards where one dashboard appends data to a file, and a second dashboard expects to find that file with data in it. If everything is just stored on the virtual file system, the generated content will be lost on page refresh, or when the dashboard is open in a different tab? The use case here is a multi page application built with Voici.
similarly if someone generates a data file from JupyterLite, which gets saved in IndexedDB via the Service worker, and then tries to load it with open('file.txt', 'r') from a Voici dashboard just to see what it would look like in a Voici dashboard

These might be niche cases compared to the vast majority of Voici dashboards. In the end these potential limitations could also simply be documented so users are aware of them.

trungleduc commented 6 months ago

The use case here is a multi page application built with Voici.

I think this use case is not very popular, even in Voila

DerThorsten commented 6 months ago

Thinking of a few cases where this could be confusing to users, if the files are only available in the virtual file system and not persisted anymore, for example when opening them with open('file.txt', 'w'):

if users have multiple Voici dashboards where one dashboard appends data to a file, and a second dashboard expects to find that file with data in it. If everything is just stored on the virtual file system, the generated content will be lost on page refresh, or when the dashboard is open in a different tab? The use case here is a multi page application built with Voici.

similarly if someone generates a data file from JupyterLite, which gets saved in IndexedDB via the Service worker, and then tries to load it with open('file.txt', 'r') from a Voici dashboard just to see what it would look like in a Voici dashboard

These might be niche cases compared to the vast majority of Voici dashboards. In the end these potential limitations could also simply be documented so users are aware of them.

I think this argument is kinda invalid because once you switch from open('file.txt', 'r') to open('/file.txt', 'r'), the content will also not be saved. So we already have some surprising and confusion behaviour.

So lets make this rather explicit. open('/persitance/file.txt', 'w') will persist, open('/some_dir/file.txt', 'w') will not persist (or similar)

trungleduc commented 6 months ago

The data persistency question in Voici should be addressed by another layer, like the remote drive. I don't think users expect to have state/data persistent across refresh in a static web application.

DerThorsten commented 6 months ago

Thinking of a few cases where this could be confusing to users, if the files are only available in the virtual file system and not persisted anymore, for example when opening them with open('file.txt', 'w'):

if users have multiple Voici dashboards where one dashboard appends data to a file, and a second dashboard expects to find that file with data in it. If everything is just stored on the virtual file system, the generated content will be lost on page refresh, or when the dashboard is open in a different tab? The use case here is a multi page application built with Voici.

similarly if someone generates a data file from JupyterLite, which gets saved in IndexedDB via the Service worker, and then tries to load it with open('file.txt', 'r') from a Voici dashboard just to see what it would look like in a Voici dashboard

These might be niche cases compared to the vast majority of Voici dashboards. In the end these potential limitations could also simply be documented so users are aware of them.

I think this argument is kinda invalid because once you switch from open('file.txt', 'r') to open('/file.txt', 'r'), the content will also not be saved. So we already have some surprising and confusion behaviour.

So lets make this rather explicit. open('/persitance/file.txt', 'w') will persist, open('/some_dir/file.txt', 'w') will not persist (or similar)

also if you change the current work-directory within your python script you will get even stranger behaviour

trungleduc commented 6 months ago

once we have jupyter-drives, jupyter-db or whatever running in lite, users can build more complicated things backed by cloud storage and database. But for the current state, I think we can clearly tell people that the files embedded in Voici are not saved on page refresh.

martinRenou commented 5 months ago

cc. @jtpio Continuing the discussion from https://github.com/voila-dashboards/voici-demo/issues/19 (Sorry I thought this was open in Voici). Where we were discussing making the Voici user API simpler when it comes to providing content.

I'm suggesting for Voici to automatically make use of the mounts option of jupyterlite-xeus.

Not sure. Shouldn't Voici be kernel agnostic?

It should. But we have an opportunity here to make Voici way more stable when using xeus-python and I think we should take it. A small check that jupyterlite-xeus is available does not hurt.

Also, at some point, jupyterlite-xeus may be actually generalized to a library for creating emscripten-based environments, hence the renaming of the library I suggested in other threads. I think this package could very well install a pyodide-based kernel that makes use of the emscripten env it provides. If that happens, making a special case for jupyterlite-xeus (or whatever it's called at that point) is not so surprising anymore.

Ideally there shoudn't be too many differences between JupyterLite and Voici (and contents in JupyterLite be fixed so it's less flaky) to keep the behavior consistent.

I actually think we can make the behavior consistent (no difference seen from the user point of view) while making it more robust in the case of Voici.

I would like to suggest refactoring the contents CLI option in Voici so that it:

forwards the contents option to jupyterlite-core in order to fill the file browser (same behavior as before)
optionally passes this contents to the new XeusAddon.mounts option to mount the content in the kernel, so that files are directly available for the kernel (bypassing the service worker approach). This would be enabled by default, in the case where jupyterlite-xeus addon is available.

We should make sure that users can mount more extra directories into the kernel using the XeusAddon option themselves.

I agree that it would make Voici less "pure" in the sense that we are making a special case for jupyterlite-xeus, but it makes it way more usable without making the CLI complicated.

martinRenou commented 5 months ago

Also I would like to say that jupyter-xeus (the org) is an official jupyter project. So it's probably sensible that jupyterlite-xeus (the lib) goes into core jupyterlite at some point in the future?

Another argument in saying making a special case for jupyterlite-xeus in Voici is not so crazy.

trungleduc commented 5 months ago

I'm suggesting for Voici to automatically make use of the mounts option of jupyterlite-xeus.

I agree, I would even make voici depending on jupyterlite-xeus, so that users always have a kernel in hand.

jtpio commented 5 months ago

Where we were discussing making the Voici user API simpler when it comes to providing content.

It can stay like this for now. The suggestion was mostly about the default command used in the voici template repo. It's probably fine if the two options (--contents and --XeusAddon.mounts) are available side by side, if they are documented somewhere.

Also, at some point, jupyterlite-xeus may be actually generalized to a library for creating emscripten-based environments, hence the renaming of the library I suggested in other threads. I think this package could very well install a pyodide-based kernel that makes use of the emscripten env it provides. If that happens, making a special case for jupyterlite-xeus (or whatever it's called at that point) is not so surprising anymore.

Yeah it would make sense to support Emscripten kernels by default in the JupyterLite CLI.

https://github.com/voila-dashboards/voici-demo/issues/19#issuecomment-1895323085 also had a point about using this feature in jupyterlite-sphinx and the Notebook 7 interface.

I agree that it would make Voici less "pure" in the sense that we are making a special case for jupyterlite-xeus, but it makes it way more usable without making the CLI complicated.

Is there something preventing doing that in the XeusAddon directly? The addon should have access to the default contents as well, since all addons are at the same level. Also because this is a feature of jupyterlite-xeus, it could make sense to keep this new logic in the same place for the time being.

I would even make voici depending on jupyterlite-xeus, so that users always have a kernel in hand.

Note that jupyterlite-core does not depend on any kernel (but the jupyterlite metapackage does for convenience and historical reasons). It would be nice for voici to follow the same idea. Or make a voici-core?

martinRenou commented 5 months ago

Is there something preventing doing that in the XeusAddon directly?

That's a good point. We should probably implement this behavior in the xeus addon directly.

jtpio commented 5 months ago

Also I would like to say that jupyter-xeus (the org) is an official jupyter project. So it's probably sensible that jupyterlite-xeus (the lib) goes into core jupyterlite at some point in the future?

That sounds fine. Currently some other official Jupyter subprojects still treat ipykernel as special, by installing it automatically (JupyterLab does it, for historical reasons since users expect to have a kernel available by default). Although if we have the choice here, maybe the use of metapackages could make more sense in the long run (similar to the separation between jupyterlite-core and jupyterlite).

Or as discussed above make it so that it works with Emscripten kernels in general.

trungleduc commented 5 months ago

Not that jupyterlite-core does not depend on any kernel (but the jupyterlite metapackage does for convenience and historical reasons). It would be nice for voici to follow the same idea. Or make a voici-core?

voici is equivalent to jupyterlite-core, since it provides a CLI to generate voici dashboards. Unlike jupyterlite where you might want to run an empty distribution of JupyterLite, this is no point in creating an empty voici page.

Reading again about jupyterlite-core and jupyterlite, it actually makes sense for voici

trungleduc commented 5 months ago

for the mount feature, do I need to have jupyterlite-xeus in my build env or only in the run env?

martinRenou commented 5 months ago

Only in the build env (where you build jupyterlite/voici), you don't need it in the run env (emscripten).

trungleduc commented 5 months ago

I would like to suggest refactoring the contents CLI option in Voici so that it:

forwards the contents option to jupyterlite-core in order to fill the file browser (same behavior as before)

optionally passes this contents to the new XeusAddon.mounts option to mount the content in the kernel, so that files are directly available for the kernel (bypassing the service worker approach). This would be enabled by default, in the case where jupyterlite-xeus addon is available.

I feel reasonable combining this with splitting voici/voici-core

trungleduc commented 5 months ago

Closing as fixed in 0.6.0

voila-dashboards / voici

Suggestion: adding custom content directly to emscriptens virtual filesystem in case of emscripten backed kernels #104