Open mgeisler opened 1 year ago
I'm interested in working on this one.
Questions:
I'm assuming we'll be looking at HTML comments?
Do you already have some expectations on what the comment structure should look like for this operations?
<!-- SKIP -->
<!-- TRANSLATOR: ... ->
I'm expecting that we'll want to expand the return type for extract_messages()
, so that we can handle the translator comments or message contexts. Does that match your expectations as well?
Questions:
I'm assuming we'll be looking at HTML comments?
Yes, precisely! They are our universal back-channel :slightly_smiling_face:
Do you already have some expectations on what the comment structure should look like for this operations?
- To mark the next paragraph as non-translatable, for example, maybe:
<!-- SKIP -->
Yeah, but I would prefix it with mdbook-xgettext:
to make it clear that this applies to this particular program:
<!-- mdbook-xgettext: skip -->
```rust,editable
fn do_not_translate_me() {}
> * To add a translator comment, maybe `<!-- TRANSLATOR: ... ->`
This actually matches [this SO answer](https://stackoverflow.com/a/75947520/110204) which suggests calling `xgettext` as
xgettext --add-comments=TRANSLATORS
to extract C comments like this:
```c
// TRANSLATORS: A translator comment
printf(_("Hello world"));
As you might have guessed, our mdbook-xgettext
plugin plays the role of xgettext
from the normal Gettext suite. Since xgettext
is configurable, we might want to make our keywords configurable as well. The configuration would be read from the book.toml
file, similar to how we read pot-file
today.
- I don't know what a message context is quite yet: I will need to read more about what "s Getttext feature" means. :)
The GNU project has documentation here, which also mentions the message context: https://www.gnu.org/software/gettext/manual/html_node/PO-Files.html
I'm expecting that we'll want to expand the return type for
extract_messages()
, so that we can handle the translator comments or message contexts. Does that match your expectations as well?
That sounds like a good idea — I've not really thought about how this will affect the code. It might be time to expand the return type to a bigger type, perhaps our own or perhaps the polib Message
type.
Okay, I think I understand. I'm starting to hack to see if I can get a proof of concept going. Here's what I'm thinking of for handling skip, to do it at the place where we're gathering groups for translation:
https://github.com/google/mdbook-i18n-helpers/compare/main...dyoo:mdbook-i18n-helpers:skip
This is not heavily tested or quite right yet. But I hope it's in a good direction!
Okay, I did a little bit more work on cleaning up the skip-handling code. https://github.com/google/mdbook-i18n-helpers/pull/69. I hope this looks good to you!
Your #69 looks great! I wanted to copy a comment from there to here since I think it can be useful for future PRs. Basically, we can take inspiration from what mdpo does.
I think we could use i18n
as a shorthand for mdbook-i18n-helpers
and implement these comments:
i18n-skip
: this PR, skips next event,i18n-disable
: starts skipping events,i18n-enable
: stops skipping,i18n-comment
: translator comment,i18n-context
and i18n-ctx
: sets message context for next (or following?) event.The i18n-
prefix is hopefully special enough that people realizes that these comments are special.
An immediate use case of this feature is skipping large code blocks in Comprehensive Rust. A related feature would be skipping code blocks automatically by default. A twist on this would be skipping code blocks that don't contain certain (configurable) patterns: if we don't see //
or "
in a code block, then we should not copy it to the PO file — at least for Comprehensive Rust where we only translate strings and comments.
Overall next plan sketch:
Group
enum from tuple to struct, so that Translate
can carry an optional translator comment along with the markdown events.
group_events
to watch for translator comments and to stuff them into the grouping context context.extract messages
to emit polib::messages.
Overall next plan sketch:
The changes in your plan look great to me!
However, may I suggest a small detour: could we add a command line option to automatically skip code blocks which don't contain "
or //
? That would allow us to remove a lot of code blocks from the PO files today (thanks to the skip functionality already built).
- I'm assuming that if there are multiple translator comment HTML comment directives, back-to-back, that we should concatenate, since polib::Message has room only for a single one.
That sounds very reasonable! There can be normal #
comments anywhere in a PO file, but we're not trying to model those. We're only trying to capture special #.
extracted comments as I just learnt they're called in the Gettext documentation. So it makes sense to concatenate multiple such extracted comments into a single comment (probably concatenate with \n\n
?).
However, may I suggest a small detour: could we add a command line option to automatically skip code blocks which don't contain
"
or//
? That would allow us to remove a lot of code blocks from the PO files today (thanks to the skip functionality already built).
Your wish is my command. Draft: https://github.com/google/mdbook-i18n-helpers/pull/75
I need to revise it since it's being applied unconditionally here instead of as a command line option. Do you have suggestions on what that command line option should look like?
I need to revise it since it's being applied unconditionally here instead of as a command line option. Do you have suggestions on what that command line option should look like?
I added a bit about that to #75 — we can pass in options via book.toml
and there we have free hands to create our own config keys. Something like
[output.xgettext]
skip-all-codeblocks = true
skip-codeblocks-except-contains = ["\"", "//"]
Perhaps the logic should be reversed? We could instead speak about "keeping code blocks" and use settings like
[output.xgettext]
keep-codeblocks = true
keep-codeblocks-matching = ["\"", "//"]
That way we have fewer double-negations in the mental model :smile:
The two options overlap, but I guess people will appreciate the clarity of writing keep-codeblocks = false
over keep-codeblocks-matching []
... Internally, one could translate keep-codeblocks = true
into keep-codeblocks-matching = ["."]
.
Okay, notes to self on implementation strategy here.
We want to read from the configuration object. I'm assuming that we want to extract from the Value
section of the Config
in an MDBook somehow.
It's a little unclear to me how the data flow works here, how we get that value threaded to the message extractor. Reading... okay, the main function of mdbook-xgettext.rs gets the RenderContext value.
I'm assuming that the places where we are calling extractMessages are all places where we can pass along a ctx: RenderContext
. Let me check.
There are two places where we call extractMessages:
So I think I may need a little more guidance here since I'm unfamiliar with the usage of mdbook-i18n-normalize; is it meaningful to have the RenderContext available when using that binary?
So I think I may need a little more guidance here since I'm unfamiliar with the usage of mdbook-i18n-normalize; is it meaningful to have the RenderContext available when using that binary?
Good question... on one hand, mdbook-i18n-normalize
probably doesn't have to know about translation comments: it's purpose it to do a PO -> PO transformation, in particular to pick up new breaking changes such as the one I made in version 0.2 (#25, #27). It is not needed to implement skipping messages: they will automatically be removed from the PO files when people regenerate the POT file and then run msgmerge
.
On the other hand, it might be good to do the filtering after extract_messages
. Let extract_messages
be the canonical low-level function which turns Markdown into smaller pieces of Markdown and let another function filter and post-process these messages. That would avoid putting too much responsibility into this one function.
Yeah, I was also thinking that I'm doing too much at the extract_message level there. Let me see if I can refactor things a bit so that we cleanly separate the filtering as a separate step.
Apologies for not giving a status report for a few weeks; I caught a nasty bug a few weeks ago that has been sapping a lot of energy from me.
I've begun the work on refactoring the directive detection so that we can
Still in progress and being cleaned up, but hopefully I'll get this ready for review shortly. I Just wanted to give you a heads up @mgeisler and to let you know I'm still alive. :)
https://github.com/google/mdbook-i18n-helpers/pull/107 is the second part to thread translation comments up to extract_messages. I need to teach mdbook-xgettext
about these comments; doing this next.
Still in progress and being cleaned up, but hopefully I'll get this ready for review shortly. I Just wanted to give you a heads up @mgeisler and to let you know I'm still alive. :)
Yay, good to see you're still here :smile: Thanks for putting up the PRs!
We should teach
mdbook-xgettext
to react to comments in the Markdown. Comments could be used for:xgettext
extracts them using a special syntax (which we should try to replicate).