spatialaudio / nbsphinx

:ledger: Sphinx source parser for Jupyter notebooks
https://nbsphinx.readthedocs.io/
MIT License
454 stars 129 forks source link

Hello from a similar project #420

Open choldgraf opened 4 years ago

choldgraf commented 4 years ago

Hey there - I wanted to reach out and mention a project that we have recently started, and that has a lot of overlapping functionality with nbsphinx. It is called "MyST-NB" and it is also a notebook parser for the Sphinx ecosystem.

This repository is part of a recent project to extend a few publishing-based tools in the Python ecosystem (jupinx and jupyter book) for the Sphinx ecosystem.

We created a new project to parse notebooks, instead of just upstreaming things to nbsphinx, because we are heavily depending on a brand new markdown parser in Sphinx (myst-parser) and also need to build a fair bit of complex functionality for executing and cacheing notebook outputs (using a project called jupyter-cache). Especially since these pieces were part of a broader publishing toolchain, it seemed too complex to try and fit in with the pre-existing ecosystem.

I don't have a specific goal in opening this issue, other than to just alert the nbsphinx devs of the existence of this tool and to say hello. Over time we are trying to upstream as much as we can (e.g. we have a few PRs open in jupyter-sphinx)...I'm not sure what exactly that means in the context of nbsphinx, and myst-nb is still only about a month old, but I wanted to reach out. Obviously we'd also love to hear what folks think about the MyST parser and MyST-NB...we'll also certainly mention nbsphinx in our docs as another core tool for "notebooks in Sphinx" πŸ‘

mgeier commented 4 years ago

Hello, thanks for the message!

I've already read about MyST-NB a few days ago in some other issue, and I've promptly added a link: #418.

I'm also aware of jupinx and Jupyter Book, but I'm probably not up-to-date with the latest developments.

I don't quite understand what jupyter-cache is all about, is this something that could be relevant to nbsphinx at some point? Is this something that should be merged into nbconvert/nbclient in the future?

I've also thought about syntax extensions, e.g. "note"/"warning" boxes in https://github.com/jupyter/notebook/issues/1292. Since the time didn't feel ripe for syntax extensions at that time (it still doesn't, BTW), I decided to go for an already somewhat available approach via manually parsed <div> tags: https://nbsphinx.readthedocs.io/en/0.5.1/markdown-cells.html#Info/Warning-Boxes

I think a syntax extension would only really make sense if it will potentially be implemented in JupyterLab (or the Classic Notebook).

Are you planning to propose your syntax extensions for JupyterLab?

Over time we are trying to upstream as much as we can

That's great, I'm looking forward to PRs!

we'd also love to hear what folks think about the MyST parser and MyST-NB

Well, as I mentioned I'd be a bit hesitant with syntax extensions.

My goal for nbsphinx has been to as much as possible use "valid" notebooks that can still be used in JupyterLab.

Of course nbsphinx adds some features that simply don't exist in JupyterLab, but I tried to make sure that each feature still somewhat makes sense without nbsphinx. For example, I've implemented links to .rst file in a way that they still work in JupyterLab (and e.g. on Github), where they just show the reST content. In nbsphinx they turn into links to a rendered HTML page.

Another example is the upcoming "gallery" feature (#392). This obviously doesn't generate a gallery in JupyterLab, but at least it will show valid links to the notebooks.

I have the feeling that with syntax extensions much of this is lost, and people are motivated less to actually open their notebooks in JupyterLab.

chrisjsewell commented 4 years ago

Heya,

I have the feeling that with syntax extensions much of this is lost, and people are motivated less to actually open their notebooks in JupyterLab.

Well MyST is now an official format in jupytext: https://jupytext.readthedocs.io/en/latest/formats.html#myst-markdown. So in that respect its easy to open the text based document in JupyterLab to do work on the code.

I think a syntax extension would only really make sense if it will potentially be implemented in JupyterLab

I kind of disagree here: In terms of writing the documentation, in particular the "richer" documentation required for writing proper scientific books (citations, references, figure captions, ...), you would probably be less likely to actually want to do this in JupyterLab; for example there is now a prototype VS Code extension, with deeper language support. The idea is, with jupytext, you can just switch between the two; Jupyterlab and Text editors.

Note, this kind of extension can/will probably also be created for JupyterLab at some point.

chrisjsewell commented 4 years ago

I don't quite understand what jupyter-cache is all about

See the very newly merged https://myst-nb.readthedocs.io/en/latest/use/execute.html

choldgraf commented 4 years ago

re: syntax extension, I think it's just a question of the goals of the project. As you say, nbsphinx is trying to play it conservatively when it comes to introducing new syntax etc. I think that makes sense. MyST-NB is intentionally trying to expand what is possible with notebooks, and so that project will get a bit more experimental. It's good to have both kinds of projects (it's also why MyST-NB is a separate repo, since I assume nbsphinx doesn't want to suddenly support a brand new flavor of markdown).

I think in the medium-long term, we should extend the syntax that Jupyter Notebooks support, because as @chrisjsewell mentions there are just a ton of things people want to do with notebooks that aren't supported by CommonMark. The plan had always been to do this whenever CommonMark extended itself to more complex syntax, but...that hasn't happened yet, and so I think we will need projects like MyST-NB to blaze a trail and see what works, and then when the time comes it will be easier to decide if and how to extend the "core Jupyter markdown syntax" with these use-cases in mind

chrisjsewell commented 4 years ago

πŸ‘

mgeier commented 4 years ago

@chrisjsewell

I have the feeling that with syntax extensions much of this is lost, and people are motivated less to actually open their notebooks in JupyterLab.

Well MyST is now an official format in jupytext: https://jupytext.readthedocs.io/en/latest/formats.html#myst-markdown.

Now I'm confused:

Are you talking about using MyST as alternative storage format (instead of JSON-based .ipynb)?

Or are you talking about using MyST as alternative format for Markdown cells within JSON-based notebooks?

Or both?

So in that respect its easy to open the text based document in JupyterLab to do work on the code.

But in this case the Markdown cells would still use extensions that are not supported by JupyterLab, right?

It might be easy to open the notebooks, but the Markdown cells will contain unrecognized markup code, right?

I think a syntax extension would only really make sense if it will potentially be implemented in JupyterLab

I kind of disagree here

That's totally fine, and it's important to acknowledge that we disagree here.

We should be aware of those different use cases.

Note, this kind of extension can/will probably also be created for JupyterLab at some point.

I guess that's a possibility. It would be interesting if one of you is actually planning to push in this direction?

Would this happen through a "Jupyter Enhancement Proposal" (JEP)? Are those still a thing, BTW?

@choldgraf

As you say, nbsphinx is trying to play it conservatively when it comes to introducing new syntax etc.

I assume nbsphinx doesn't want to suddenly support a brand new flavor of markdown

Exactly. I would like to follow whatever is used by JupyterLab.

However the Markdown parser could theoretically be factored out in order to play around with some experimental parser that supports new syntax extensions. I haven't looked into this, though, so I don't know whether it would be realistic to implement custom parser support in nbsphinx. I wanted to get rid of pandoc for a long time, so probably it would be a good start to re-factor it into an exchangeable Markdown-parsing module?

I think in the medium-long term, we should extend the syntax that Jupyter Notebooks support

I agree. But I think we should try to not overdo it with the extensions. The overall complexity should remain reasonably low.

The plan had always been to do this whenever CommonMark extended itself to more complex syntax, but...that hasn't happened yet

Indeed!

... and then when the time comes it will be easier to decide if and how to extend the "core Jupyter markdown syntax" with these use-cases in mind

How would that happen? Through a JEP?

choldgraf commented 4 years ago

re: your points about JEPs - yep, I'd guess that this is where any broad markdown flavor extension would happen. It'll be easier to get community buy-in if there is prior art, and even better if there are users that have tested out various options first. I think that's part of what we're trying to accomplish with myst-nb. And as @chrisjsewell mentions, we can also build in functionality in jupyter via extensions before it needs to be an "official" part of the core markdown flavor.

(also worth noting that the major extra syntax pieces in myst-markdown are designed to degrade gracefully in a markdown renderer that doesn't understand MyST. Things like roles and directives will mostly just become "literal" blocks, and will still display)

Are you talking about using MyST as alternative storage format (instead of JSON-based .ipynb)?

Or are you talking about using MyST as alternative format for Markdown cells within JSON-based notebooks?

Or both?

Both:

chrisjsewell commented 4 years ago

Note that jupyter lab may (hopefully) eventually move over to using markdown-it as the core renderer jupyterlab/jupyterlab/issues/272. At this point, given that MyST-Parser also now uses markdown-it-py as the underlying markdown parser and myst-highlight uses markdown-it for the VS Code extension, then implementing the render extensions will be "trivial".

chrisjsewell commented 4 years ago

People can write notebooks in MyST markdown (using a {code-cell} directive) and this can be two-way synchronized with an ipynb file using Jupytext.

Also note that these files are now integrated with execution and caching, so that you have the option of never actually converting it to a notebook and, when building the sphinx documentation, code cell outputs are pulled straight from the cache, and the files will only be re-executed when any of the code (not markdown) content changes

mgeier commented 4 years ago

@choldgraf

Thanks for clarifying the different use cases, now the situation is clearer to me, but still not completely clear:

Is the {code-cell} part of the "normal" MyST syntax or is it an extension only implemented in MyST-NB?

I guess latter, because I could only find it in https://github.com/ExecutableBookProject/MyST-NB/blob/master/myst_nb/converter.py, and not in https://github.com/ExecutableBookProject/MyST-Parser.

So as I understand it now, there are actually two different things:

Is this correct?

If yes, that means there are those possible use cases:

It looks like there are two orthogonal things mixed together in MyST-NB.

That's of course totally fine if you like to organize the project this way, but it would make it less confusing for outsiders (like me) if you could make this clearer in the documentation.

It'll be easier to get community buy-in if there is prior art, and even better if there are users that have tested out various options first. I think that's part of what we're trying to accomplish with myst-nb.

OK, that sounds reasonable.

But wouldn't it be even better for users to test out the options in JupyterLab?

And as @chrisjsewell mentions, we can also build in functionality in jupyter via extensions before it needs to be an "official" part of the core markdown flavor.

What mention are you referring to? I couldn't find anything ...

Are you talking about JupyterLab extensions? Are there already JupyterLab extensions that allow Markdown syntax extensions?

also worth noting that the major extra syntax pieces in myst-markdown are designed to degrade gracefully in a markdown renderer that doesn't understand MyST. Things like roles and directives will mostly just become "literal" blocks, and will still display

I disagree that this is "graceful"!

It might be useful for some ad-hoc experiments, but I think it's a very bad idea to ab-use code blocks for arbitrary (non-code!) uses. It's really annoying that there isn't more progress on the CommonMark front, but there are some discussions about a generic block syntax, and AFAICT it doesn't look like using code block as generic blocks is a likely outcome.

But anyway "degrading gracefully" shouldn't be such an important aspect when coming up with syntax extensions. This should only be relevant for temporary and/or ad-hoc solutions.

@chrisjsewell

Note that jupyter lab may (hopefully) eventually move over to using markdown-it as the core renderer jupyterlab/jupyterlab/issues/272.

I'm aware of this issue, but I decided to not hold my breath. And it's quite a few year old by now ...

In fact, I already suggested (a few months earlier, nearly 4 years ago) to switch the Classic Notebook to CommonMark, mentioning markdown-it as a possible implementation: https://github.com/jupyter/notebook/issues/1371

Also note that these files are now integrated with execution and caching, so that you have the option of never actually converting it to a notebook and, when building the sphinx documentation, code cell outputs are pulled straight from the cache, and the files will only be re-executed when any of the code (not markdown) content changes

That sounds great, regardless whether the source file is an actual Jupyter notebook or some other storage format.

Are there plans to make this caching mechanism available to general use together with nbconvert --execute or nbclient?

More specifically, I think it would be interesting to use this in nbsphinx at some point!

But for that, the most important question for me would be: Does this look only at the code in cells, or also at its (transitive) dependencies?

If latter, this would be really interesting, and it would probably also take care of https://github.com/spatialaudio/nbsphinx/issues/87.

chrisjsewell commented 4 years ago

That's totally fine, and it's important to acknowledge that we disagree here.

Absolutely, this is all just good discourse, even if I do come off a bit argumentative 😁

But wouldn't it be even better for users to test out the options in JupyterLab?

I think its import to note that, from my perspective, our project is the Executable Book Project not the "Documentable Notebook Project". JupyterLab and nbsphinx are great for creating some quick documentation from notebooks. But for anything more than simple documentation, JupyterLab is currently and will probably remain, a far from ideal environment for writing proper scientific articles.

The way I would use MyST-NB, would be to exclusively write in the .md format, within VS Code (or your favourite text editor) and only open the notebook in JupyterLab (using jupytext) when I actually need to write some code and generate some outputs. Essentially I would rarely write in markdown cells within JupyterLab.

In this respect, personally, I really don't care how JupyterLab displays the Markdown cells. But yes this is certainly not the only use case.

it's a very bad idea to ab-use code blocks for arbitrary (non-code!) uses with nbconvert --execute or nbclient

disagree, from the commonmark spec

The content of a code fence is treated as literal text

i.e. there is nothing to say that fenced block must contain code per say, and that is exactly what sphinx directives are: "interpreted literal text"

but there are some discussions about a generic block syntax, and AFAICT it doesn't look like using code block as generic blocks is a likely outcome

I be very interested if you could point me towards this. In terms of CommonMark though, I am going to go out on a limb and say there will never be any changes now to the core syntax. The only thing the CommonMark forum is full of is 7 year old discussions about adding different syntax, that never actually happens lol.

Are there plans to make this caching mechanism available to general use

Well it is already its own separate package jupyter-cache

Does this look only at the code in cells, or also at its (transitive) dependencies?

Maybe you could clarify what (transitive) dependencies you had in mind? But it does tentatively have the hooks in place (if not yet implemented) to handle assets (external files required by the notebook to execute) and artefacts (external files output by the notebook execution)

choldgraf commented 4 years ago

It looks like there are two orthogonal things mixed together in MyST-NB.

(and to your broader points about the use-cases)

I think you've generally got it right πŸ‘ and we appreciate the feedback on documentation. We've been more in coding mode than documenting mode lately, but I think it's time to take another pass through to improve explanations etc because it has been a while.

Technically, right now MyST-NB doesn't define the specification for a "Jupyter notebook written in markdown with MyST markdown", it just knows how to use Jupytext to convert myst notebooks into ipynb files at parse time. So the way I think about the division of labor:

But wouldn't it be even better for users to test out the options in JupyterLab?

Yes for sure - there are only so many hours in the day though :-) we need the core build system to be there first, and then we can start building out an ecosystem of plugins etc around this project.

Are there already JupyterLab extensions that allow Markdown syntax extensions?

I'm not sure about Jupyter Lab, but there were certainly extensions in the Jupyter Notebook that did this, in particular for things like adding references and citations. I believe matthias also once had a plugin that would do variable injection into the markdown at rendering time.

chrisjsewell commented 4 years ago

Are there already JupyterLab extensions that allow Markdown syntax extensions?

This may be of relevance https://github.com/jupyterlab/jupyter-renderers

mgeier commented 4 years ago

@chrisjsewell

But wouldn't it be even better for users to test out the options in JupyterLab?

I think its import to note that, from my perspective, our project is the Executable Book Project not the "Documentable Notebook Project".

OK, that's good to know.

JupyterLab and nbsphinx are great for creating some quick documentation from notebooks. But for anything more than simple documentation, JupyterLab is currently and will probably remain, a far from ideal environment for writing proper scientific articles.

I guess it depends on what you mean by "proper scientific articles". If you mean "traditional articles", I fully agree.

Do you have an example (probably a mock-up?) of an Executable Book that already reaches your goal of "proper scientific article"?

What's the most polished example that's currently existing?

The way I would use MyST-NB, would be to exclusively write in the .md format, within VS Code (or your favourite text editor) and only open the notebook in JupyterLab (using jupytext) when I actually need to write some code and generate some outputs. Essentially I would rarely write in markdown cells within JupyterLab.

In this respect, personally, I really don't care how JupyterLab displays the Markdown cells.

OK, so you force people to use two different tools for Markdown and for code.

I guess the plan is to further enhance VS Code so that at some point JupyterLab isn't needed anymore?

it's a very bad idea to ab-use code blocks for arbitrary (non-code!) uses with nbconvert --execute or nbclient

disagree, from the commonmark spec

The content of a code fence is treated as literal text

I guess we have a different understanding what "literal text" is.

For me, "literal text" means that it contains literal characters that are not interpreted as markup. It often (or always?) is displayed in a fixed-width font.

One of the most important features of "literal text" is that "newline" characters are displayed as actual new lines. There are typically no automatic line breaks.

All in all, I think this is a bad (i.e. not very graceful) fallback.

i.e. there is nothing to say that fenced block must contain code per say, and that is exactly what sphinx directives are: "interpreted literal text"

No, it's not.

There is only one Sphinx directive which fits this description: parsed-literal.

https://docutils.sourceforge.io/docs/ref/rst/directives.html#parsed-literal

IMHO "interpreted non-literal text" would be more appropriate as a fallback.

but there are some discussions about a generic block syntax, and AFAICT it doesn't look like using code block as generic blocks is a likely outcome

I be very interested if you could point me towards this.

Sure. There's a lot of discussions.

I have the feeling that the colon (:::) syntax is kinda the favorite right now, but I guess there is no official statement about this.

I don't yet have an opinion on what exactly is the most appropriate syntax, but I do know that backtick fences are not appropriate and kinda the worst possible option.

In terms of CommonMark though, I am going to go out on a limb and say there will never be any changes now to the core syntax.

I think so, too, I guess there will only be minor changes.

But that doesn't mean that there will never be additional syntax, e.g. for generic blocks.

The only thing the CommonMark forum is full of is 7 year old discussions about adding different syntax, that never actually happens lol.

Yeah, it's funny.

But I can't complain, because I haven't yet done anything to solve the situation. Currently I don't really want to open that Pandora's box, but maybe I'll have more time and energy for that in the future.

Are there plans to make this caching mechanism available to general use

Well it is already its own separate package jupyter-cache

OK, that's cool.

I haven't still quite understood: Is this supposed to be used together with nbconvert --execute or instead of it?

Does this look only at the code in cells, or also at its (transitive) dependencies?

Maybe you could clarify what (transitive) dependencies you had in mind?

Well all of them!

Imagine you have a notebook containing code which imports a local Python package which in turn imports, say, Matplotlib.

My question was whether all used source files will be considered for caching.

For example: I execute a notebook, then update my local Matplotlib installation, then execute the notebook again.

Will the second run use the cache or will it re-run the notebook (assuming some relevant files in the Matplotlib source code have changed)?

But it does tentatively have the hooks in place (if not yet implemented) to handle assets (external files required by the notebook to execute) and artefacts (external files output by the notebook execution)

OK, that sounds good. The question here would be: are those assets supposed to be discovered automatically or provided by the user?

Are there already JupyterLab extensions that allow Markdown syntax extensions?

This may be of relevance https://github.com/jupyterlab/jupyter-renderers

I don't see any Markdown renderers there?

chrisjsewell commented 4 years ago

I have the feeling that the colon (:::) syntax is kinda the favorite right now, but I guess there is no official statement about this.

The closest to an official use of ::: are pandoc fenced divs. These are container blocks for markdown which are annotated with classes. This is not the behaviour you want from an interpreted text block (i.e. sphinx directive), so I would doubt this would become a standard, given that it would be incompatible with pandoc markdown.

Literally the only thing you need for these blocks is to tell the parser do not automatically parse the content as markdown; store the content verbatim, and it will be interpreted at a later time. That is exactly what backticks do perfectly fine.

The only directives where non-literal/container blocks are appropriate are for admonitions like note, warning, etc. I do actually intend to add an (optional) extension to use treat this subset of directives using ::: blocks, but this is not applicable to general directive blocks which should most certinly not be treated as non-literal.

Also, the most widely used text-based notebook format, RMarkdown, uses backticks. So are you saying that it is a bad format?

humitos commented 4 years ago

I'm following the discussion quietly and just wanted to chime in to comment about this,

For example: I execute a notebook, then update my local Matplotlib installation, then execute the notebook again. Will the second run use the cache or will it re-run the notebook (assuming some relevant files in the Matplotlib source code have changed)?

Sphinx already has a mechanism to determine this (-E option when calling it). If any, I'd use its own internal mechanism to determine if a file needs to be rebuilt/re-rendered or not and keep compatibility with Sphinx. The extension could connect to this event to check what has changed and only rebuild the files that are needed: https://www.sphinx-doc.org/en/master/extdev/appapi.html#event-env-get-outdated

My 2Β’

chrisjsewell commented 4 years ago

Sphinx already has a mechanism to determine this

I think you are confusing what sphinx does; its only checking if the mtime of a file has changed, not doing anything complex like checking if the local python environment has changed. You could try doing this via pip/conda (recording the package/version list at time of execution), but it would be questionable if this would be robust. For now, you would just signal "manually' that you want to re-execute in this instance.

The extension could connect to this event

FYI this is already what MyST-NB does, but in using jupyter-cache it does it in a "smarter" way, when considering whether notebooks need to be re-executed, i.e. notebooks are only re-executed if code/kernel content changes, not text content. If only text is changed, the doctree is still rebuilt, but the existing code outputs are used

humitos commented 4 years ago

I think you are confusing what sphinx does; its only checking if the mtime of a file has changed, not doing anything complex like checking if the local python environment has changed.

That's what I tried to say, but maybe I didn't express myself properly. I'd follow the philosophy that Sphinx follows, not exactly the same mechanism. That is, if they are not considering the Python environment itself (all packages versions) as part of the their cached mechanism, I wouldn't consider either for the notebooks.

You could try doing this via pip/conda (recording the package/version list at time of execution), but it would be questionable if this would be robust. For now, you would just signal "manually' that you want to re-execute in this instance.

I agree with you on this. I think this is hard to implement in a reliable manner and probably the benefit it's not too much. I think you will end up calling Sphinx with -E anyways if you don't trust 100% on the caching mechanism.

If only text is changed, the doctree is still rebuilt, but the existing code outputs are used

I read this in its docs and I think it's smart to follow this distinction :smile:

chrisjsewell commented 4 years ago

Yeh absolutely, its just trying to get the balance right between automating the re-build logic, and having manual control to force the rebuild of aspects of the build. -E is really the "nuclear" option lol and I wouldn't say much different to just deleting the build folder, so where possible I think you would want to avoid that

mgeier commented 4 years ago

@chrisjsewell

I have the feeling that the colon (:::) syntax is kinda the favorite right now, but I guess there is no official statement about this.

The closest to an official use of ::: are pandoc fenced divs. These are container blocks for markdown which are annotated with classes. This is not the behaviour you want from an interpreted text block (i.e. sphinx directive),

I guess there are two kinds of directives: ones which want to keep the literal text (e.g. code blocks) and ones whose contents are supposed to be further parsed.

I think the fenced code block syntax (```) would make sense for the first kind, and the colon syntax (:::) for the second, wouldn't it?

so I would doubt this would become a standard, given that it would be incompatible with pandoc markdown.

I don't understand what would become incompatible by what happening, can you please elaborate?

Literally the only thing you need for these blocks is to tell the parser do not automatically parse the content as markdown; store the content verbatim, and it will be interpreted at a later time. That is exactly what backticks do perfectly fine.

Exactly, for those kinds of directives that syntax is indeed perfectly fine.

The other kind is the problem.

For example, from https://myst-parser.readthedocs.io/en/latest/using/syntax.html#directives-a-block-level-extension-point:

```{admonition} My markdown link
Here is [markdown link syntax](https://jupyter.org)


> The only directives where non-literal/container blocks are appropriate are for admonitions like `note`, `warning`, etc. I do actually intend to add an (optional) extension to use treat this subset of directives using `:::` blocks,

Oh, I should probably have read this before answering above ...

> but this is not applicable to general directive blocks which should most certinly not be treated as non-literal.

What makes one kind more general than the other?

I think you will need support for both kinds.

> Also, the most widely used text-based notebook format, RMarkdown, uses backticks. So are you saying that it is a bad format?

I'm not familiar with RMarkdown, so I don't know.
I doubt that it is plainly a "bad" format (except for people who consider Markdown itself "bad"), but it may have bad parts, I don't know.

If they use backtick fences for non-literal text, that part would certainly be bad, but I don't know whether they do.

@humitos 

> > For example: I execute a notebook, then update my local Matplotlib installation, then execute the notebook again.
> > Will the second run use the cache or will it re-run the notebook (assuming some relevant files in the Matplotlib source code have changed)?
> 
> Sphinx already has a mechanism to determine this ([`-E` option](https://www.sphinx-doc.org/en/master/man/sphinx-build.html#cmdoption-sphinx-build-E) when calling it).

Yes, this mostly works well.

But when including executed Jupyter notebooks, whoever executes them (e.g. `nbconvert`) should check during execution which code files are used. This information could then be used by `nbsphinx` to tell Sphinx which code files are dependencies of the notebook.

This is what issue #87 is about.

And that's why I was asking about (transitive) dependencies, because such a feature would be really great.

The Sphinx re-build mechanism has a big flaw, though: Whenever an uncaught exception happens during the Sphinx build, this seems to invalidate the whole "environment" and the next build will re-build *everything*.

That's why an *additional* caching mechanism for notebooks may actually make sense.
Or do you know a way around the exception problem?

@chrisjsewell 

> > Sphinx already has a mechanism to determine this
> 
> I think you are confusing what sphinx does; its only checking if the `mtime` of a file has changed, not doing anything complex like checking if the local python environment has changed.

Well the local python environment also consists of files which have an `mtime`, right?

The "notebook executor" would just need to watch all file accesses and store a list of dependent files.
This list of files could then easily be provided to Sphinx.

See also #87 for a related discussion.

> You could try doing this _via_ pip/conda (recording the package/version list at time of execution), but it would be questionable if this would be robust.

I agree. This doesn't seem like a feasible option.

> FYI this is already what MyST-NB does, but in using `jupyter-cache` it does it in a "smarter" way, when considering whether notebooks need to be re-executed, i.e. notebooks are only re-executed if code/kernel content changes, not text content. If only text is changed, the doctree is still rebuilt, but the existing code outputs are used

I think that's a great feature.

And I think this cannot be handled by the Sphinx mechanism, right?

I have the feeling that both mechanisms will be needed to get the "full" experience.
We will have to tell Sphinx the correct dependencies in order to start the re-building in the first place, but then we need also the notebook-specific features to not execute code unnecessarily.

@humitos 

> I think you will end up calling Sphinx with `-E` anyways if you don't trust 100% on the caching mechanism.

There will always be some remaining cases where `-E` will be used.
But I think it would still be great if we had a really trustworthy caching mechanism so those cases could become rare.
chrisjsewell commented 4 years ago

What makes one kind more general than the other?

The interplay between the markdown parser and the parsing of content of a docutils directive. With backticks you are telling the markdown parser to do nothing to the text within, just store it as is, then that text can be passed directly to the docutils directive class for further processing. With ::: you are telling the markdown parser to parse all the content contained within as markdown (to parsed tokens). You can then no longer directly use the docutils directive classes and have to imply different (special) logic to replicate what that class does internally.

Well the local python environment also consists of files which have an mtime, right? The "notebook executor" would just need to watch all file accesses and store a list of dependent files.

I think the word "just" here is a bit generous lol. Firstly notebooks are not constrained to python, the approach would need to be generic to any programming language. Secondly this seems to imply that you would need to scan all the many 1000s of files related to the environment. I imagine that would have a significant impact on performance. Happy for you to point me towards an implementation of how this could be achieved though.

mgeier commented 4 years ago

What makes one kind more general than the other?

The interplay between the markdown parser and the parsing of content of a docutils directive. With backticks you are telling the markdown parser to do nothing to the text within, just store it as is, then that text can be passed directly to the docutils directive class for further processing. With ::: you are telling the markdown parser to parse all the content contained within as markdown (to parsed tokens).

Exactly.

You can then no longer directly use the docutils directive classes and have to imply different (special) logic to replicate what that class does internally.

I don't understand.

You parse the contents of, say, ```{admonition} blocks with the MyST parser anyway, and not with docutils, right?

So you'll have to have special logic for that already.

Well the local python environment also consists of files which have an mtime, right? The "notebook executor" would just need to watch all file accesses and store a list of dependent files.

I think the word "just" here is a bit generous lol. Firstly notebooks are not constrained to python, the approach would need to be generic to any programming language.

Yeah, it would be best if it could be generic. As long as you can monitor file accesses from a given process (which contains the kernel), this might work. I haven't implemented anything of this, though, so I don't have a clue, actually.

If it needs support from the kernel, this might become quite a bit more complicated. Then probably some protocols would have to be extended and it would have to be implemented in each kernel separately.

Secondly this seems to imply that you would need to scan all the many 1000s of files related to the environment. I imagine that would have a significant impact on performance.

Well when the Python interpreter runs the stuff, it reads all those files, does that really take that long?

The caching mechanism wouldn't even have to read the files, only some of their metadata.

And 1000s of files doesn't sound like an impossibility. I wouldn't want to check them manually, but that's what we have computers for, isn't it?

Happy for you to point me towards an implementation of how this could be achieved though.

The one thing I was thinking of is https://github.com/GaretJax/sphinx-autobuild.

I hadn't looked into this previously, but I just had a quick look and it seems to use https://github.com/gorakhargosh/watchdog for watching files.

Of course we wouldn't need to "watch" the files in this sense, we "just" would need to get the list of accessed files. But that information must be in there somewhere ...

chrisjsewell commented 4 years ago

You parse the contents of, say, ```{admonition} blocks with the MyST parser anyway, and not with docutils, right?

Not exactly no; the content gets parsed directly (as unparsed text) to the docutils Admonition directive, the same as with any other directive. Its just that the directives get initiated with a markdown specific state machine.

mgeier commented 4 years ago

Its just that the directives get initiated with a markdown specific state machine.

Ah, OK, that sounds magical!

So you can customize the way Sphinx parses a directive without changing the directive itself?

Just out of curiosity, could you please point me to the MyST-Parser code where this happens?

This will explain how the implementation works, but my criticism still stands: It doesn't make logical sense to pass literal text to an "admonition" block. An admonition (as I understand it) contains formatted text, not literal text.

chrisjsewell commented 4 years ago

So you can customize the way Sphinx parses a directive without changing the directive itself?

https://github.com/executablebooks/MyST-Parser/blob/a084975f02c0b4a9141f75878f78b48afa9f9b5a/myst_parser/docutils_renderer.py#L670, directive are initialised exactly the same as they are in docutils, except they are parsed a different state machine which is "polymorphically identical" to the rST one, but runs nested parsing through a markdown parser, rather than an rST one. In this way you can directly use mostly any directive originally written for docutils/sphinx (unless they called any particularly low level docutils methods, not yet supported by our state machine)

This code (as with docutils) does not discriminate between "admonition" type directives or any other type of directive, they are all just parsed a literal block of text. To change this you would have to add special cases to docutils or myst-parser.

mgeier commented 4 years ago

Cool, thanks! I've seen the name "state machine" a few times in the Sphinx source code, but I've never really looked into it. It seems to be a very powerful extension mechanism.

I guess with this you get correct line numbers in case of parsing errors?

This is something that I don't like about nbsphinx, because if there is a parsing error, the line number corresponds to the temporary RST file and not to the actual notebook file.

This code (as with docutils) does not discriminate between "admonition" type directives or any other type of directive, they are all just parsed a literal block of text. To change this you would have to add special cases to docutils or myst-parser.

Yes, it would be good to do that in order to achieve consistent syntax.

chrisjsewell commented 4 years ago

state-machine is just the theoretical framework on which the parser is built, well technically: https://en.wikipedia.org/wiki/Pushdown_automaton

I guess with this you get correct line numbers in case of parsing errors?

When using text-based notebooks (i.e. https://jupytext.readthedocs.io/en/latest/formats.html#myst-markdown) as the input source, yes the line numbers related directly to the correct lines. Then when using notebooks, there wasn't a "simple" way to incorporate specifically the cell number as a seperate thing, so my solution for now is to use <cell number> * 10000 + <line number>, see:

mgeier commented 4 years ago

That's a nice work-around!

In the long run though, I think it would be great to support line numbers for all formats.

But I guess for this to work for "normal" .ipynb notebooks, nbformat would have to provide that information in the first place.

astrojuanlu commented 3 years ago

Hi folks πŸ‘‹πŸ½ I know this is an old & long conversation but I have been hesitating a long time to ask the question: what are nowadays the differences between myst-nb and nbsphinx in terms of functionality? In my head, they are largely equivalent, but I might be missing something. I looked at https://nbsphinx.readthedocs.io/en/0.8.0/links.html and https://myst-nb.readthedocs.io/en/latest/examples/custom-formats.html?highlight=nbsphinx and it's still not clear to me.

In poliastro, a personal project, I am using nbsphinx + jupytext to include MyST notebooks in Sphinx documentation, and it's working really well.

chrisjsewell commented 3 years ago

Well one of the key differences is that nbsphinx uses Pandoc to first convert Markdown text to RST text, then runs that through the RST parser (to convert to docutils AST), whereas myst-nb uses myst-parser to directly convert the (MyST) Markdown text to docutils AST.

This means that the Markdown you use for nbsphinx is mainly Pandoc flavoured Markdown (https://pandoc.org/MANUAL.html), plus the syntax extensions detailed in the documentation, whereas for myst-nb you use MyST flavoured Markdown (https://myst-parser.readthedocs.io/en/latest/syntax/syntax.html), including roles, directives, etc, (note also any of the configurations/extensions for myst-parser are also applied to notebook files, since myst-nb just builds on top of myst-parser)

There's also differences in the execution engines; nbsphinx executes each notebook during the parsing phase whereas, depending on execution mode, myst-nb executes all notebooks up front and caches them with jupyter-cache

(obviously @mgeier can correct me if I wrong in any of this)

mgeier commented 3 years ago

@astrojuanlu I'm not sure if you need to use the myst_parser extension in your conf.py, I hope https://github.com/poliastro/poliastro/pull/1260 will clear that up [UPDATE: it is actually needed, see the next comment below].

The MyST project is a bit confusing because there are two different things called "myst", see my comment above: https://github.com/spatialaudio/nbsphinx/issues/420#issuecomment-608399740.

AFAICT, you are using MyST only as a serialization format for "normal" Jupyter notebooks, i.e. notebooks with "normal" Markdown in their Mardown cells. Since Jupytext can handle those, you can use them with nbsphinx_custom_formats, as you do.

There would be a different possible use case though: you could use MyST syntax within the Markdown cells of your notebooks. This is currently not supported by nbsphinx, I guess you would need the myst_parser extension for that.

mgeier commented 3 years ago

I'm not sure if you need to use the myst_parser extension in your conf.py, I hope poliastro/poliastro#1260 will clear that up.

It looks like the myst_parser extension is not needed for the notebooks, but it is needed for some of the *.md files which contain MyST syntax.

Coming back to @astrojuanlu's question:

what are nowadays the differences between myst-nb and nbsphinx in terms of functionality?

You are using nbsphinx's gallery feature, I don't know if that's possible with myst_parser. I only found https://executablebooks.org/en/latest/gallery.html; I don't know if it can be used in a similar situation.

Another difference is the visual appearance of code cells and their outputs, but this can be tuned by custom CSS, if desired.

Other than that, there are for sure many minor differences, but I'm not aware of any bigger differences that haven't yet been mentioned in this issue.

astrojuanlu commented 3 years ago

Thanks @chrisjsewell and @mgeier for your inputs! Yes, we are using both MySTs :) The MyST format for notebooks, and MyST for our narrative documentation. As you saw, we leverage nbsphinx gallery feature, and we wrote it in MyST too.

I ask this not only because I was mildly confused myself, but because I'm in the process of writing some documentation about the whole "Jupyter in Sphinx" story and wanted to convey a coherent message. Your replies and experiments have been very useful.

astrojuanlu commented 3 years ago

For reference, that documentation I was writing is this: https://github.com/readthedocs/readthedocs.org/pull/8283