sphinx-contrib / restbuilder

A Sphinx builder/writer to output reStructuredText (rst) files
BSD 2-Clause "Simplified" License
31 stars 27 forks source link

Images associated with substitution references are lost #25

Open altendky opened 3 years ago

altendky commented 3 years ago

The following bits are snipped a few different ways. See links for actual text. I am no reStructuredText expert so hopefully my terms aren't too incorrect, but it seems that while the substitution reference does get processed enough to result in the substitution definition's target being presented as the link, it does not process the definition's image.

https://github.com/altendky/qtrio/blob/eddc9446fda97567dd4e44971ff088bd0380743b/docs/source/README.rst

|documentation badge|
<snip>
.. _documentation: https://qtrio.readthedocs.io
.. |documentation badge| image:: https://img.shields.io/badge/docs-read%20now-blue.svg?color=royalblue&logo=Read-the-Docs&logoColor=whitesmoke
   :target: `documentation`_
   :alt: Documentation

becomes https://github.com/altendky/qtrio/blame/eddc9446fda97567dd4e44971ff088bd0380743b/README.rst#L9-L12

`documentation <https://qtrio.readthedocs.io>`_   

The full scenario is that I am trying to render my .rst readme including .. literalinclude:: as well as badge images since GitHub won't process literalincludes.

https://github.com/altendky/qtrio/pull/233

.rst result as rendered on GitHub. image

.html result as rendered on Read the Docs. image

macfreek commented 3 years ago

Thanks for the report. I haven't found the time to dive into this issue, but will try to do so.

altendky commented 3 years ago

@macfreek thanks for checking in. I understand there are many things to be done. If I didn't have so many I would spend more time on this myself. :]

macfreek commented 3 years ago

It turns out that this is a lot more difficult than I anticipated.

The short version is:

I've posted a plea for feedback to sphinx-dev.

altendky commented 3 years ago

Awesome... :] Thank you for digging into this.

While I'd rather do it with 'proper' .rst tooling, I could 'solve' my underlying issue with a jinja template or even simpler since it is actually just about the lack of support on GitHub for literalinclude. Or I could do away with the ugly table that I got talked into a few months back. :] Anyways, I'm still itchin' to do it right and appreciate your effort. I look forward to a response providing you with a path forward to help us all.

I just realized I hadn't included the relevant GitHub issue where they rejected supporting includes. Anyways... https://github.com/github/markup/issues/172#issuecomment-33241601

macfreek commented 3 years ago

There are indeed two distinct issues here: support for substitutions (|reference|), and support for the .. literalinclude:: and .. include:: directives. I'll have a look in the second later. Some full working days coming up first. My above comment was mostly a summary of finding of the root cause, which (hopefully) allows me to start finding a solution when I find some spare time.

You just have to face the ugly table a little bit longer than I hoped ;)

altendky commented 3 years ago

Sorry, restbuilder's literalinclude support does work. That's the primary interest in restbuilder to begin with. :]

macfreek commented 3 years ago

I seem to have hit a few limitation of the reStructuredText syntax.

The approach that I took technically works, but seems too fragile, and I decided not to continue.

Rather than closing this as a wontfix, I'll try to write down why supporting both file includes and substitutions is non-trivial.

I would appreciate to hear a bit more about the desired use case, and what you can live without: substitutions or includes. Edit: while writing this post, I was thinking of another approach how this might be supported, as listed under Second Approach.

An example

Let me explain the (technical) challenge with an example:

In index.rst::

|test123|_ is the best test!

.. include:: substitutions.rst

and in substitutions.rst::

.. |test123| replace:: test number *123*
.. _test123: https://example.org/test/123

In two steps, this produces a perfectly valid doctree. First the include is transformed::

<paragraph><reference refname="test123"><substitution_reference refname="test123">test123</substitution_reference></reference> is the best test!</paragraph>
<substitution_definition names="test123">test number <emphasis>123</emphasis></substitution_definition>
<target ids="test123" names="test123" refuri="https://example.org/test/123"/>

And secondly, the substitution_reference and (hyperlink) reference are transformed::

<paragraph><reference refuri="https://example.org/test/123">test number <emphasis>123</emphasis></reference> is the best test!</paragraph>
<substitution_definition names="test123">test number <emphasis>123</emphasis></substitution_definition>
<target ids="test123" names="test123" refuri="https://example.org/test/123"></target>

Note that the substitution_definition and target definition remain, although there are both resolved.

Now this is perfectly fine doctree, which creates perfectly fine HTML:

<p><a href="https://example.org/test/123">test number <em>123</em></a> is the best test!</p>

Sadly, it does not simply create the desired reStructuredText::

`test number *123* <https://example.org/test/123>`_

This will not parse the emphasis, just <reference refuri="https://example.org/test/123">test number *123*</reference>.

This is very similar to what you see: an unparsed piece of reStructuredText in a place where you expected it to be rendered. In your case, it is a link or an image in a table. (Note: restbuilder incorrectly leaves some part out, but even if it would include it, you would not see the image you expect).

As noted before:

Attempt at a workaround

So my first attempt was to leave the substitutions intact. Technically, it seems simple. Sphinx and docutils run about 40 transformations, from (adding) smart quotes, toctree, to reference transforms. I just had to disable one of them: substitution transforms.

I managed to do that.

Sadly, this approach seems too fragile, and I have now decided not to continue. The reasons it is fragile are:

  1. Sphinx does not provide an API to do this, so I had to replicate a few hundred lines of code. it works, but will add overhead to ensure that the extension code and the Sphinx code remain compatible.
  2. It breaks the doctree caching of Sphinx. By disabling a particular transform, the resulting doctree is different. It now still contains these unresolved substitutions. When you first output rst, than output html, the HTML writer will receive the cached doctree, and is confronted with these unresolved substitutions that it doesn't understand. This requires more hackery to remove the caches.
  3. Resolving substitutions are sometimes useful. Sphinx has the powerful ability to keep track of referencing targets across different files. A substitution may not just contain a "replace" or "image" directive, but also some more advanced directive that may rely on these cross references. Perhaps that may continue to work, perhaps not. Normally Sphinx takes care of that. By keeping the original reference in place, I am not so sure if all that should be resolved, is resolved.

Second approach

The bottom line why your code is failing is because it has nested markup. In your case, an image and hyperlink in a table.

What restbuilder is doing now is simply ignoring all nested markup. Either break of the out markup structure, e.g. "close" the table, and add the image underneath the table, and start a new table. That will be a mess. What it does now is simple skip the inner markup, or display the inner markup as plain text.

One possible options is to get the parsed doctree, with all substitution done, and detect nested markup, and in those cases, try to create a custom substitution. It might fail in some cases, but at least I don't have to go the whole duplication-of-code and break-caching route, and may support a few common cases, like markup in hyperlinks and images in tables.

altendky commented 3 years ago

Thank you both for all the effort on the problem and the detailed explanations.

I think I had run into the limited nesting support in a couple small situations before without really internalizing it. This strikes me as a significant flaw, but it apparently hasn't actually caused me all that much trouble in other cases. I think it is time for me to take this limitation to heart and just avoid nesting.

Then it seems I can use restbuilder to compensate for GitHub not supporting literalinclude...

I feel a bit bad walking away after you have done so much and also before you have totally given up, but it seems like the robust path for me to take in the future and frees you from the responsibility we place on ourselves when someone is waiting on our efforts. If you do continue, I can certainly test things out.

macfreek commented 3 years ago

I feel a bit bad walking away after you have done so much and also before you have totally given up, [...]

No worries. As far as I'm concerned I don't have an obligation to offer this support, and you certainly don't have an obligation to use it 🙂

I'm more concerned that you assume I will give up 😛

As for the effort, it took a bit of time, but it did help me understand some of the internals of sphinx and docutils, which also help get this rst builder in the shape I like it to be. I appreciate your offer to test things, and may held you to it :)