Closed HidenoriKobayashi closed 3 months ago
Thanks @HidenoriKobayashi!
I think there might be two problems here. First, we have this test, which demonstrates that we chop up the text when we encounter a HTML tag (including a HTML comment):
There is also this test, which demonstrates that we lose text immediately following a (block-level?) HTML tag:
I think it's an open question if we could or should extract the HTML into the POT file. One concern would be if we should just extract foo
for something like <abbr title="foo">bar</abbr>
? That would probably require us to know that the title
attribute has translatable text (which is doable). We would still be chopping up the text — which I guess is nicer than asking everybody to deal with HTML in the translation files?
This is only about what gets extracted and put into the POT file. The other half of this is how we weave things together when we translate the Markdown later.
I just tried adding a little unit test for mdbook-gettext
:
#[test]
fn test_translate_html() {
let catalog = create_catalog(&[("foo ", "FOO "), ("bar", "BAR"), (" baz", " BAZ")]);
assert_eq!(
translate("foo <b>bar</b> baz", &catalog),
"FOO <b>BAR</b> BAZ"
);
}
it fails with
---- tests::test_translate_html stdout ----
thread 'tests::test_translate_html' panicked at i18n-helpers/src/bin/mdbook-gettext.rs:239:9:
assertion failed: `(left == right)`
Diff < left / right > :
<FOO <b>BAR</b>BAZ
>FOO <b>BAR</b> BAZ
which shows that we lose a ' '
immediately following the </b>
!
Further investigation shows that extract_events(" BAZ", state)
give a Text(Borrowed("BAZ"))
back! It should really give " BAZ"
back so this seems to be where the space is lost.
This is actually fair: Markdown does not distinguish between a line of text starting with " foo"
vs "foo"
. So when we extract the events, we lose the leading whitespace.
I'm not entirely sure where where we should account for this, but perhaps extract_events
could add it back in? Or perhaps the caller should account for being in an inline context and then add it? Note that we have a pulldown_cmark_to_cmark::State
object available and it knows that last_was_html
... so that might be helpful here?
So I think these two tests should work:
diff --git a/i18n-helpers/src/bin/mdbook-gettext.rs b/i18n-helpers/src/bin/mdbook-gettext.rs
index b686be4..291bb2c 100644
--- a/i18n-helpers/src/bin/mdbook-gettext.rs
+++ b/i18n-helpers/src/bin/mdbook-gettext.rs
@@ -233,6 +233,15 @@ mod tests {
);
}
+ #[test]
+ fn test_translate_html() {
+ let catalog = create_catalog(&[("foo ", "FOO "), ("bar", "BAR"), (" baz", " BAZ")]);
+ assert_eq!(
+ translate("foo <b>bar</b> baz", &catalog),
+ "FOO <b>BAR</b> BAZ"
+ );
+ }
+
#[test]
fn test_translate_table() {
let catalog = create_catalog(&[
diff --git a/i18n-helpers/src/lib.rs b/i18n-helpers/src/lib.rs
index 3bf2a5f..484d348 100644
--- a/i18n-helpers/src/lib.rs
+++ b/i18n-helpers/src/lib.rs
@@ -590,6 +590,30 @@ mod tests {
assert_eq!(extract_events("", None), vec![]);
}
+ #[test]
+ fn extract_events_leading_whitespace() {
+ assert_eq!(
+ extract_events(" foo", None),
+ vec![
+ (1, Start(Paragraph)),
+ (1, Text(" foo".into())),
+ (1, End(Paragraph)),
+ ]
+ );
+ }
+
+ #[test]
+ fn extract_events_trailing_whitespace() {
+ assert_eq!(
+ extract_events("foo ", None),
+ vec![
+ (1, Start(Paragraph)),
+ (1, Text("foo ".into())),
+ (1, End(Paragraph)),
+ ]
+ );
+ }
+
#[test]
fn extract_events_paragraph() {
assert_eq!(
@kdarkhan, the problem here is about how we lose a ' '
when the Markdown AST is a mix of Text
and Html
nodes. I believe #125 is related: it's about how breaking apart the text on HTML elements causes problems. I think we could solve both problems if we would let Text
and Html
nodes be grouped together.
I believe this should be resolved now with https://github.com/google/mdbook-i18n-helpers/pull/195 merged.
I think so too, let's close it!
I'm redirected to here from https://github.com/google/comprehensive-rust/issues/1284.
The source: https://github.com/google/comprehensive-rust/blob/a38a33c8fba58678d0a9127d9644242bce41ed94/src/bare-metal/microcontrollers/probe-rs.md
The po file (bit outdated but updating it does not resolve the issue): https://github.com/google/comprehensive-rust/blob/a38a33c8fba58678d0a9127d9644242bce41ed94/po/ja.po#L13420
Now, if I make this change (just for testing purpose)
I get a result like this:
The strange thing about this is that string
cargo-embed
got merged into the last item in the list.