Allow non-RST parsers to substitute the Locale transform

jpmckinney commented 3 years ago

Describe the bug

https://github.com/sphinx-doc/sphinx/commit/79650f5827e146afe3ff50485f8a4bd6755f895d (#4938) added code that adds a line of hyphens under (for example) an admonition's title's text.

However, a line of hyphens is an RST-specific syntax for a heading. It does not indicate a heading in all parsers.

In Markdown, for example, a line of hyphens is a second-level heading https://spec.commonmark.org/0.29/#setext-headings. This can cause the heading level to jump from 0 to 2.

To Reproduce

I can create a test scenario, but the bug should be clear from reading the code, with the above context.

Expected behavior

For Markdown, we can just change the hyphen to an equals sign, which Markdown will interpret as a first-level heading.

For other parsers, I suppose there would need to be hooks for those parsers to convert the string to a heading.

tk0miya commented 3 years ago

I'm interested what is

jpmckinney commented 3 years ago

@tk0miya I think your comment got cut off?

Also, the general case isn’t fixed - #8853 just fixed one scenario.

tk0miya commented 3 years ago

Oops, Sorry. My understanding is current Locale transform is developed only for reST translation. So it does not work well with other parsers. I merged #8853 to Sphinx because it does not affect translating of the reST document. But it does not mean that Locale transform will support the markdown translation. If your goal is translating the whole of markdown document, we need to develop the mechanism to switch a translation-transform for each parser.

jpmckinney commented 3 years ago

Thanks, yes, that seems like the appropriate solution.

jpmckinney commented 3 years ago

I'm thinking the Locale transform is mostly generic. There are just a few places where it constructs RST strings. I'm wondering if those lines can be moved to methods (e.g. title, literal_block, etc.). That way, another parser can subclass the Locale transform and override those methods.

n-peugnet commented 8 months ago

If I understand correctly, all these issues come from the fact that in publish_msgstr() we parse the source message once again but without the context of the rest of the source file:

https://github.com/sphinx-doc/sphinx/blob/80d5396f9d434d7c71d6f7ac5fa727a972b69584/sphinx/transforms/i18n.py#L73-L78

From what it saw in the extracted messages from the gettext builder, it seems a message can never span over multiple "block elements", like for exemple multiple paragraphs. So a cleaner way to achieve the desired result would be to only parse for "inline elements". For reST it would be Inliner.parse(), for MyST it would be ParserInline.parse.

There could be a function in Sphinx's Parser base class like parse_inline() that children should implement. https://github.com/sphinx-doc/sphinx/blob/80d5396f9d434d7c71d6f7ac5fa727a972b69584/sphinx/parsers.py#L24-L31

It seems like it should allow to remove all the strange hacks in the i18n code like: https://github.com/sphinx-doc/sphinx/blob/80d5396f9d434d7c71d6f7ac5fa727a972b69584/sphinx/transforms/i18n.py#L459-L465

https://github.com/sphinx-doc/sphinx/blob/80d5396f9d434d7c71d6f7ac5fa727a972b69584/sphinx/transforms/i18n.py#L472-L477

Does this idea make sense to you? Do you think it could work?

jpmckinney commented 8 months ago

I'd like @chrisjsewell and @choldgraf from MyST Parser to reflect on your proposed parse_inline.

In general, removing those hacks would be ideal!

n-peugnet commented 5 months ago

@jpmckinney:

I'd like @chrisjsewell and @choldgraf from MyST Parser to reflect on your proposed parse_inline.

In general, removing those hacks would be ideal!

FYI, I made a proof of concept in #12238

jpmckinney commented 3 months ago

LGTM!

sphinx-doc / sphinx

Allow non-RST parsers to substitute the Locale transform #8852