google / mdbook-i18n-helpers

Translation support for mdbook. The plugins here give you a structured way to maintain a translated book.
Apache License 2.0
133 stars 25 forks source link

Add support for translation comments #63

Open mgeisler opened 1 year ago

mgeisler commented 1 year ago

We should teach mdbook-xgettext to react to comments in the Markdown. Comments could be used for:

dyoo commented 1 year ago

I'm interested in working on this one.

dyoo commented 1 year ago

Questions:

I'm assuming we'll be looking at HTML comments?

Do you already have some expectations on what the comment structure should look like for this operations?

I'm expecting that we'll want to expand the return type for extract_messages(), so that we can handle the translator comments or message contexts. Does that match your expectations as well?

mgeisler commented 1 year ago

Questions:

I'm assuming we'll be looking at HTML comments?

Yes, precisely! They are our universal back-channel :slightly_smiling_face:

Do you already have some expectations on what the comment structure should look like for this operations?

  • To mark the next paragraph as non-translatable, for example, maybe: <!-- SKIP -->

Yeah, but I would prefix it with mdbook-xgettext: to make it clear that this applies to this particular program:

<!-- mdbook-xgettext: skip -->

```rust,editable
fn do_not_translate_me() {}

> * To add a translator comment, maybe `<!-- TRANSLATOR: ... ->`

This actually matches [this SO answer](https://stackoverflow.com/a/75947520/110204) which suggests calling `xgettext` as

xgettext --add-comments=TRANSLATORS


to extract C comments like this:

```c
// TRANSLATORS: A translator comment
printf(_("Hello world"));

As you might have guessed, our mdbook-xgettext plugin plays the role of xgettext from the normal Gettext suite. Since xgettext is configurable, we might want to make our keywords configurable as well. The configuration would be read from the book.toml file, similar to how we read pot-file today.

  • I don't know what a message context is quite yet: I will need to read more about what "s Getttext feature" means. :)

The GNU project has documentation here, which also mentions the message context: https://www.gnu.org/software/gettext/manual/html_node/PO-Files.html

I'm expecting that we'll want to expand the return type for extract_messages(), so that we can handle the translator comments or message contexts. Does that match your expectations as well?

That sounds like a good idea — I've not really thought about how this will affect the code. It might be time to expand the return type to a bigger type, perhaps our own or perhaps the polib Message type.

dyoo commented 1 year ago

Okay, I think I understand. I'm starting to hack to see if I can get a proof of concept going. Here's what I'm thinking of for handling skip, to do it at the place where we're gathering groups for translation:

https://github.com/google/mdbook-i18n-helpers/compare/main...dyoo:mdbook-i18n-helpers:skip

This is not heavily tested or quite right yet. But I hope it's in a good direction!

dyoo commented 1 year ago

Okay, I did a little bit more work on cleaning up the skip-handling code. https://github.com/google/mdbook-i18n-helpers/pull/69. I hope this looks good to you!

mgeisler commented 1 year ago

Your #69 looks great! I wanted to copy a comment from there to here since I think it can be useful for future PRs. Basically, we can take inspiration from what mdpo does.

I think we could use i18n as a shorthand for mdbook-i18n-helpers and implement these comments:

The i18n- prefix is hopefully special enough that people realizes that these comments are special.

An immediate use case of this feature is skipping large code blocks in Comprehensive Rust. A related feature would be skipping code blocks automatically by default. A twist on this would be skipping code blocks that don't contain certain (configurable) patterns: if we don't see // or " in a code block, then we should not copy it to the PO file — at least for Comprehensive Rust where we only translate strings and comments.

dyoo commented 1 year ago

Overall next plan sketch:

mgeisler commented 1 year ago

Overall next plan sketch:

The changes in your plan look great to me!

However, may I suggest a small detour: could we add a command line option to automatically skip code blocks which don't contain " or //? That would allow us to remove a lot of code blocks from the PO files today (thanks to the skip functionality already built).

  • I'm assuming that if there are multiple translator comment HTML comment directives, back-to-back, that we should concatenate, since polib::Message has room only for a single one.

That sounds very reasonable! There can be normal # comments anywhere in a PO file, but we're not trying to model those. We're only trying to capture special #. extracted comments as I just learnt they're called in the Gettext documentation. So it makes sense to concatenate multiple such extracted comments into a single comment (probably concatenate with \n\n?).

dyoo commented 1 year ago

However, may I suggest a small detour: could we add a command line option to automatically skip code blocks which don't contain " or //? That would allow us to remove a lot of code blocks from the PO files today (thanks to the skip functionality already built).

Your wish is my command. Draft: https://github.com/google/mdbook-i18n-helpers/pull/75

I need to revise it since it's being applied unconditionally here instead of as a command line option. Do you have suggestions on what that command line option should look like?

mgeisler commented 1 year ago

I need to revise it since it's being applied unconditionally here instead of as a command line option. Do you have suggestions on what that command line option should look like?

I added a bit about that to #75 — we can pass in options via book.toml and there we have free hands to create our own config keys. Something like

[output.xgettext]
skip-all-codeblocks = true
skip-codeblocks-except-contains = ["\"", "//"]

Perhaps the logic should be reversed? We could instead speak about "keeping code blocks" and use settings like

[output.xgettext]
keep-codeblocks = true
keep-codeblocks-matching = ["\"", "//"]

That way we have fewer double-negations in the mental model :smile:

The two options overlap, but I guess people will appreciate the clarity of writing keep-codeblocks = false over keep-codeblocks-matching []... Internally, one could translate keep-codeblocks = true into keep-codeblocks-matching = ["."].

dyoo commented 1 year ago

Okay, notes to self on implementation strategy here.

We want to read from the configuration object. I'm assuming that we want to extract from the Value section of the Config in an MDBook somehow.

It's a little unclear to me how the data flow works here, how we get that value threaded to the message extractor. Reading... okay, the main function of mdbook-xgettext.rs gets the RenderContext value.

I'm assuming that the places where we are calling extractMessages are all places where we can pass along a ctx: RenderContext. Let me check.

There are two places where we call extractMessages:

So I think I may need a little more guidance here since I'm unfamiliar with the usage of mdbook-i18n-normalize; is it meaningful to have the RenderContext available when using that binary?

mgeisler commented 1 year ago

So I think I may need a little more guidance here since I'm unfamiliar with the usage of mdbook-i18n-normalize; is it meaningful to have the RenderContext available when using that binary?

Good question... on one hand, mdbook-i18n-normalize probably doesn't have to know about translation comments: it's purpose it to do a PO -> PO transformation, in particular to pick up new breaking changes such as the one I made in version 0.2 (#25, #27). It is not needed to implement skipping messages: they will automatically be removed from the PO files when people regenerate the POT file and then run msgmerge.

On the other hand, it might be good to do the filtering after extract_messages. Let extract_messages be the canonical low-level function which turns Markdown into smaller pieces of Markdown and let another function filter and post-process these messages. That would avoid putting too much responsibility into this one function.

dyoo commented 1 year ago

Yeah, I was also thinking that I'm doing too much at the extract_message level there. Let me see if I can refactor things a bit so that we cleanly separate the filtering as a separate step.

dyoo commented 1 year ago

Apologies for not giving a status report for a few weeks; I caught a nasty bug a few weeks ago that has been sapping a lot of energy from me.

I've begun the work on refactoring the directive detection so that we can

https://github.com/google/mdbook-i18n-helpers/compare/main...dyoo:mdbook-i18n-helpers:refactor-messages?expand=1

Still in progress and being cleaned up, but hopefully I'll get this ready for review shortly. I Just wanted to give you a heads up @mgeisler and to let you know I'm still alive. :)

dyoo commented 1 year ago

https://github.com/google/mdbook-i18n-helpers/pull/107 is the second part to thread translation comments up to extract_messages. I need to teach mdbook-xgettext about these comments; doing this next.

mgeisler commented 1 year ago

Still in progress and being cleaned up, but hopefully I'll get this ready for review shortly. I Just wanted to give you a heads up @mgeisler and to let you know I'm still alive. :)

Yay, good to see you're still here :smile: Thanks for putting up the PRs!