mondeja / mdpo

Markdown files translation using GNU PO files
https://mondeja.github.io/mdpo/
BSD 3-Clause "New" or "Revised" License
25 stars 6 forks source link

Collapsed reference links converted in POT to shortcut reference links #221

Closed ivilata closed 2 years ago

ivilata commented 2 years ago

(After the last comments from #164.)

According to CommonMark link specs, a collapsed reference link like [foo][] and a shortcut reference link like [foo] should both be valid and equivalent to [foo][foo].

However, md2po 0.3.84 converts collapsed links to shortcut ones in POT entries. For instance, with a test.md with the following content:

A link to [example.com][].

Another link to [example.com].

[example.com]: http://example.com/

the command md2po test.md > test.po creates these entries (I added the missing translation):

#: test.md:block 1 (paragraph)
msgid "A link to [example.com]."
msgstr "A translated link to [example.com]."

#: test.md:block 2 (paragraph)
msgid "Another link to [example.com]."
msgstr "Another translated link to [example.com]."

#: test.md:block 2 (paragraph)
#, fuzzy
msgid "[example.com]: http://example.com/"
msgstr "[example.com]: http://example.com/"

Then po2md -p test.po -s /dev/stdout test.md produces:

A translated link to [example.com].

Another translated link to [example.com].

[example.com]: http://example.com/

The final output is correct, the issue is that PO strings slightly differ from the original ones. I would expect that both [foo][] and [foo] keep their shape in the PO file.

Please note that the original Markdown contemplated [foo][] but not [foo] as valid links, so thay may affect some editors or syntax highlighters (like GitHub's).

This started happening with mdpo 0.3.80 (while 0.3.79 would convert them to [example.com][example.com] instead, which confuses some translators).

Thank you!

mondeja commented 2 years ago

Thanks. The problem is essentially that MD4C parser does not provide information about the formatting of the links, so I can't differentiate between [foo][] and [foo]. You can see the information provided by MD4C in link spans appending the --debug option:

md2po --debug test.md ``` md2po[DEBUG]::2022-02-24 20:29:42.607483::enter_block:: DOC md2po[DEBUG]::2022-02-24 20:29:42.607544::enter_block:: P md2po[DEBUG]::2022-02-24 20:29:42.607569::text:: A link to md2po[DEBUG]::2022-02-24 20:29:42.607603::enter_span:: A - {'href': [(, 'http://example.com/')], 'title': None} md2po[DEBUG]::2022-02-24 20:29:42.607887::text:: example.com md2po[DEBUG]::2022-02-24 20:29:42.607921::leave_span:: A - {'href': [(, 'http://example.com/')], 'title': None} md2po[DEBUG]::2022-02-24 20:29:42.607941::text:: . md2po[DEBUG]::2022-02-24 20:29:42.607956::leave_block:: P md2po[DEBUG]::2022-02-24 20:29:42.607968::msgid:: msgid='A link to [example.com].' md2po[DEBUG]::2022-02-24 20:29:42.608007::enter_block:: P md2po[DEBUG]::2022-02-24 20:29:42.608026::text:: Another link to md2po[DEBUG]::2022-02-24 20:29:42.608054::enter_span:: A - {'href': [(, 'http://example.com/')], 'title': None} md2po[DEBUG]::2022-02-24 20:29:42.608069::text:: example.com md2po[DEBUG]::2022-02-24 20:29:42.608087::leave_span:: A - {'href': [(, 'http://example.com/')], 'title': None} md2po[DEBUG]::2022-02-24 20:29:42.608103::text:: . md2po[DEBUG]::2022-02-24 20:29:42.608118::leave_block:: P md2po[DEBUG]::2022-02-24 20:29:42.608128::msgid:: msgid='Another link to [example.com].' md2po[DEBUG]::2022-02-24 20:29:42.608154::leave_block:: DOC md2po[DEBUG]::2022-02-24 20:29:42.608167::msgid:: msgid='' md2po[DEBUG]::2022-02-24 20:29:42.608181::link_reference:: target='example.com' - href='http://example.com/' md2po[DEBUG]::2022-02-24 20:29:42.608199::msgid:: msgid='[example.com]: http://example.com/' - msgstr='[example.com]: http://example.com/' - flags='['fuzzy']' # msgid "" msgstr "" #: test.md:block 1 (paragraph) msgid "A link to [example.com]." msgstr "" #: test.md:block 2 (paragraph) msgid "Another link to [example.com]." msgstr "" #: test.md:block 2 (paragraph) #, fuzzy msgid "[example.com]: http://example.com/" msgstr "[example.com]: http://example.com/" ```

This was implemented in #204 (here) and could be implemented in either format, always being the same for all, but used this because is simpler. As I can see, Github accept all possible formats as valid links:

[example1]

[example1]: https://example1.com

[example2][example2]

[example2]: https://example2.com

[example3][]

[example3]: https://example3.com

example1

example2

example3

Really this issue is a duplicate of #152, but as said, is not possible to implement it, at least for now. Of course, PRs are welcome!

Do you know other parsers that would not be able to parse links like [foo]?

ivilata commented 2 years ago

Thanks @mondeja for the clarification! Since I have usually followed John Gruber's original specs, I've never used [foo] myself, so I can't list other parsers beyond GitHub's md syntax highlighter and Emacs' markdown-mode.

So, since the parsing issue is out of your control, and the choice between [foo][] and [foo] seems mostly aesthetic, feel free to close this issue. Thanks!