sphinx-gallery / sphinx-gallery

Sphinx extension for automatic generation of an example gallery
https://sphinx-gallery.github.io
BSD 3-Clause "New" or "Revised" License
415 stars 200 forks source link

Outsource the "pure python notebook" lib #933

Open smarie opened 2 years ago

smarie commented 2 years ago

Today while working on #930, thinking that I was going to duplicate the fix for mkdocs-gallery (#895), I asked myself what could be the scope of a common core lib supporting both.

At least there is one thing that would make sense: the "pure python notebook engine", that currently

We could call such a shared lib "pynotebook" but the name is already used, so maybe something like "pure_notebook" ? What do you think ?

larsoner commented 2 years ago

Instead of requiring a separate lib, could mkdocs-gallery just depend on sphinx-gallery, and we make the interface in this lib?

smarie commented 2 years ago

Not really, as mkdocs-gallery has not dependency to any sphinx lib, just mkdocs. The engine could be completely independent of any external dependency (thus, solving https://github.com/sphinx-gallery/sphinx-gallery/issues/421 by creating an autidable independent lib, that at least could be compared with the jupyter one based on common tests, for example)

EDIT: actually #421 was more about rich output (scrapers), my proposal is really much more focused on simple cell-by-cell execution and results collection. So, the two tickets are not that connected

larsoner commented 2 years ago

Not really, as mkdocs-gallery has not dependency to any sphinx lib, just mkdocs.

Makes sense then!

EDIT: actually https://github.com/sphinx-gallery/sphinx-gallery/issues/421 was more about rich output (scrapers), my proposal is really much more focused on simple cell-by-cell execution and results collection

In some sense "rich scraping" could be considered a form of "results collection" / "output" capture. It really depends on what me mean by "output". If by "output" you mean stdout/stderr and not anything graphical, then scrapers would be excluded, and would stay in the sphinx-gallery scope. But would this really be optimal for mkdocs-gallery? I would assume that getting execution + prints + figures/images would be much more useful, no?

smarie commented 2 years ago

Yes, you are right. I guess that we could start with pure doc extraction + block-wise execution + config + raw output variables gathering, and then add an optional scrapers engine.

smarie commented 1 year ago

Hi @larsoner , long time no chat :) I was trapped in many other industrial activities unfortunately. I came across https://jupytext.readthedocs.io/en/latest/using-library.html recently

Do you have an opinion about how this could possibly be a candidate to replace the "common core" that I was mentioning above ? Or is this lib "not independent enough" (from jupyter) in your mind to be a suitable core engine for sphinx-gallery ?

larsoner commented 1 year ago

At least 4 years ago they used our code for conversion I think?

https://github.com/sphinx-gallery/sphinx-gallery/issues/424#issuecomment-439605021

But maybe it's not that way today...

GaelVaroquaux commented 1 year ago

jupytext depends on nbformat which then draws a lot of dependencies.

We would like sphinx-gallery to remain lightweight to setup. In my view one of the current limitations of sphinx-gallery is that it is too tedious to setup (eg on a CI). Of course, this is not related to the dependencies, and there would be not absolute opposition to adding a small dependency if it simplifies things a lot.

Beside, the hardest part of converting to notebook format is the move from rst to md, which would be needed anyhow to benefit from jupytext.

I think that, if we were to make major movements, one promising direction (but a major movement) would be to look at the growing md support of sphinx. But I'm not sure how we would make this work with the part that interacts with formatting the docstrings of the project that we are documenting.

smarie commented 1 year ago

Very clear @GaelVaroquaux thanks ! So better an independent lib in charge of managing this rst (and possibly md in the future) representation and execution of a notebook, than trying to reuse jupytext because of its dependency to nbformat. The "refactored" engine clone in mkdocs-gallery, or the original engine itself in sphinx-gallery, could be a good source for this, but someone still needs to extract it, define a clear API and publish it. Hopefully one of these days someone motivated enough could try that move. For now I do not have visible bandwidth in the upcoming months, but "you never know" :)