google / mdbook-i18n-helpers

Translation support for mdbook. The plugins here give you a structured way to maintain a translated book.
Apache License 2.0
133 stars 25 forks source link

Filter out empty `msgid` entries #64

Closed mgeisler closed 12 months ago

mgeisler commented 1 year ago

I tried running mdbook-xgettext on the Rust Book. After removing formatting from the SUMMARY.md file, I end up with a messages.pot file which is almost correct:

% msgcat -o po/messages.pot po/messages.pot
po/messages.pot:1660: duplicate message definition...
po/messages.pot:3: ...this is the location of the first definition
msgcat: found 1 fatal error

The problem is this entry:

#: src/appendix-04-useful-development-tools.md:120
#: src/appendix-04-useful-development-tools.md:149
msgid ""
msgstr ""

which in turn originates from a non-empty HTML tag:

<span class="filename">Filename: src/main.rs</span>

The empty msgid is also produced by empty HTML tags:

<span id="ferris"></span>

We should handle such entries correctly. This would mean

  1. Ensure we don't add duplicate entries to the PO file. This is perhaps something that need fixing in polib.
  2. Perhaps we should extract the text inside inline HTML tags such as the span above?