Open Waerden001 opened 4 years ago
HTML-style hyperlinks are currently not supported by nbsphinx
.
You can use one of these instead:
https://google.com
<https://google.com>
[Google](https://google.com)
[Google][1]
[1]: https://google.com
[Google]
[Google]: https://google.com
Why is this the case, when regular Markdown supports html tags like <a>
?
This can be useful when for example wanting to use a hyperlink with a class.
I noticed many other tags are stripped as well, e.g. <em>
, <strong>
, <article>
...
And writing <p>test</p>
results in <p></p><p>test</p><p></p>
which is unexpected.
Is there a workaround for this other than using raw NBConvert html cells?
From the documentation https://nbsphinx.readthedocs.io/en/0.4.1/markdown-cells.html#HTML-Elements-(HTML-only) https://nbsphinx.readthedocs.io/en/0.4.1/raw-cells.html#HTML
Maybe these parts of the documentation should be clarified if certain html tags are stripped.
Why is this the case, when regular Markdown supports html tags like
<a>
?
Simply because nobody has implemented it yet. And until this very issue, nobody has requested it either.
This is quite easy to implement if you just want to simply convert Markdown to HTML and nothing else.
In the case of nbsphinx
it is a bit more complicated, though.
The Markdown content is first converted (by pandoc
plus some AST manipulations) to reStructuredText which is then converted to the internal representation of Sphinx/docutils.
From this internal representation, Sphinx can generate HTML and LaTeX (and EPUB, and ...) output files (involving some further custom manipulations).
Raw HTML snippets which are just passed through will be missing in the LaTeX output.
There are already two special cases implemented which also work with LaTeX output: <img>
and <div class="alert alert-...">
.
Theoretically, a third special case for <a>
could be added.
This can be useful when for example wanting to use a hyperlink with a class.
I guess this could be implemented. Do you want to make a PR?
I noticed many other tags are stripped as well, e.g.
<em>
,<strong>
,<article>
...
I guess they get lost in the conversion from Markdown to reStructuredText.
I think they are swallowed by pandoc
. I don't know if it's possible to avoid that.
In the long term, I'd like to avoid the intermediate reStructuredText representation (and the use of pandoc
), see #36 (but this might still take quite a while). But then it might be easier to fix this.
And writing
<p>test</p>
results in<p></p><p>test</p><p></p>
which is unexpected.
OK, that's strange, that's probably an artifact caused by the use of the various tools mentioned above.
Is there a workaround for this other than using raw NBConvert html cells?
You can write something like this in your Markdown cell:
<div class="my-class">
[Google](https://google.com)
</div>
The <div>
tags will survive the conversion and then you should be able to use a CSS selector like .my-class a
to select the link.
Alternatively, you could try if MyST-NB handles this situation more to your liking.
You can also try RunNotebook (which uses a more direct Markdown-to-HTML conversion) or any of the alternatives mentioned in https://nbsphinx.readthedocs.io/en/0.7.0/links.html.
Maybe these parts of the documentation should be clarified if certain html tags are stripped.
Yes, definitely, the documentation is missing some important information here!
Would you like to make a PR to fix this?
I will look into pandoc
and see if there are options for converting html tags, that might be the cleanest solution.
Regarding the documentation: not just div
seems to be supported, but audio
and some others as well. If you know by any chance where these special html tags are converted to rst that would be a great help. Happy to make a PR for the docs, not sure if making an exception for just a
tags would be worth it though.
Not sure if that might be out-of-scope for this issue, but my original use-case for <a>
tags was that I wanted to replicate automatically linking to classes generated with autodoc
as is possible in rst
, e.g.:
:class:`.SomeClass`
And my specific problem was that I could not replicate the html
the above line would generate in markdown. Long story short, pandoc
actually extends markdown and accommodates this case with
`.SomeClass`{.interpreted-text role="class"}
This won't be nicely displayed in a notebook, but that would have been a long shot either way.
I think the documentation should more clearly say that Markdown cells are treated as pandoc markdown
, I will submit a PR for that later.
Interestingly enough, any pandoc markdown
that involves div
, does not seem be supported by nbsphinx
(maybe there is custom code for div
in place?)
If I'm not mistaken, one could even add autodoc using the following:
<div class="automodule" data-members="" data-undoc-members="" data-show-inheritance="">
some_module.submodule
</div>
Regarding the documentation: not just
div
seems to be supported, butaudio
and some others as well.
Yes, I think <audio>
and <video>
are the most relevant, that's why I'm showing them in https://nbsphinx.readthedocs.io/en/0.7.0/markdown-cells.html#HTML-Elements-(HTML-only).
If you know by any chance where these special html tags are converted to rst that would be a great help.
The pandoc
options are here:
The +raw_html
setting passes some HTML tags (but apparently not all?) through.
Then there is some special handling for citations ans <img>
tags, but <audio>
and <video>
don't need special handling.
You can check pandoc
s behavior like this:
$ pandoc -f markdown-native_divs -t rst
<div>bla</div>
^D
.. raw:: html
<div>
bla
.. raw:: html
</div>
Note that for (future) CommonMark compatibility, blank lines should be used inside the <div>
tags:
$ pandoc -f commonmark -t rst
<div>bla</div>
.. raw:: html
<div>bla</div>
vs.
$ pandoc -f commonmark -t rst
<div>
bla
</div>
^D
.. raw:: html
<div>
bla
.. raw:: html
</div>
Happy to make a PR for the docs,
That would be great!
not sure if making an exception for just
a
tags would be worth it though.
I don't know, probably not.
Not sure if that might be out-of-scope for this issue, but my original use-case for
<a>
tags was that I wanted to replicate automatically linking to classes generated withautodoc
as is possible inrst
, e.g.::class:`.SomeClass`
My work-around for autodoc
links is https://nbsphinx.readthedocs.io/en/0.7.0/markdown-cells.html#Links-to-Domain-Objects.
This is of course not as simple as :class:`SomeClass`
, but the advantage is that the links also look somewhat reasonable in JupyterLab/nbviewer/Github.
I think the documentation should more clearly say that Markdown cells are treated as
pandoc markdown
, I will submit a PR for that later.
I would prefer not mentioning pandoc
, because it is just an implementation detail which will be removed in the (rather far) future.
I think it would be better to mention a few tags that work (e.g. <div>
, <audio>
) and vaguely mention that not all tags work.
This way we are open for future changes in behavior.
Interestingly enough, any
pandoc markdown
that involvesdiv
, does not seem be supported bynbsphinx
(maybe there is custom code fordiv
in place?)
nbsphinx
uses the -native_divs
option, maybe that's the culprit?
The raw <div>
tags are parsed in the ReplaceAlertDivs
transform, in order to find "alert" divs which are turned into "notes"/"warnings".
But all other <div>
elements should be passed through?
If I'm not mistaken, one could even add autodoc using the following [...]
You mean instead of using the automodule
directive?
Why not just use a raw reST cell (or a separate reST source file) for that?
Regarding the documentation: not just
div
seems to be supported, butaudio
and some others as well.Yes, I think
<audio>
and<video>
are the most relevant, that's why I'm showing them in https://nbsphinx.readthedocs.io/en/0.7.0/markdown-cells.html#HTML-Elements-(HTML-only).If you know by any chance where these special html tags are converted to rst that would be a great help.
The
pandoc
options are here:The
+raw_html
setting passes some HTML tags (but apparently not all?) through.
My use of a markdown cell is usually just a mixture of plain text, HTML tags, images and Latex code, nbsphinx
+ sphinx
handle everything smoothly except those tiny HTML tags, so is it possible to handle more HTML tags like <a>
by just modifying the +raw_html
settings a little bit?
I don't know. Probably. How would you modify them?
My work-around for autodoc links is https://nbsphinx.readthedocs.io/en/0.7.0/markdown-cells.html#Links-to-Domain-Objects.
I saw that workaround, unfortunately it does not replicate the styling that is applied when linking to domain objects in sphinx. Functionally it does the same though, so it is a solution.
I saw that workaround, unfortunately it does not replicate the styling that is applied when linking to domain objects in sphinx.
Yeah, I know, the problem is that reST doesn't allow nested markup, see #301. This will hopefully become possible when #36 is solved, but this might take some more time ...
I use
sphinx
withnbsphinx
to generate HTML files from Jupyter Notebook files. But hyperlinks in the notebook doesn't show up in the converted html file. More preciselyindex.ipynb
which contains an cell with html-style hyperlink<a href="https://google.com">Google</a>
to the Google website.sphinx
commandmake html
withnbsphinx
as an extension to generate the documentationindex.html
, the hyperlink turns into un-formated text, namely only the plain textGoogle
appear in the source code ofindex.html
, the hyperlink<a href="https://www.google.com"></a>
part just disappears.Does
nbsphinx
keep the hyperlinks in the notebook when used in sphinx as an extension?