Closed daroczig closed 7 months ago
Side note: we might disable this check:
It doesn't understand that different languages can use different "full stop" characters:
My thought for this is to leverage {potools}:
pot = potools:::get_po_messages("src/library/base/po/R-base.pot")
ja = potools:::get_po_messages("src/library/base/po/R-ja.po")
pot[, msgid_plural_str := sapply(msgid_plural, paste, collapse = "||")]
ja[, msgid_plural_str := sapply(msgid_plural, paste, collapse = "||")]
merge(
pot, ja,
by = c("message_source", "type", "msgid", "msgid_plural_str")
)[
grepl('^\\s+', msgstr.x) != grepl('^\\s+', msgstr.y),
.N,
by = message_source
]
# message_source N
# 1: R 502
Not sure why the number doesn't exactly match what Weblate sees, but we should be able to use this as a first pass & see what else shakes out.
@daroczig, what's the best way to proceed from here? Strip the whitespace in that data.frame, write out a fresh po file, copy that into the weblate mirror somehow?
we might disable this check [mismatched full stop]
I think it was only due to the extra trailing space, no? I've dropped the leading and trailing space and it resolved all 3 errors at https://translate.rx.studio/translate/r-project/base-r/ja/?q=%22%27%25s%27+is+defunct.%22&sort_by=-priority%2Cposition&offset=1#history
what's the best way to proceed from here? Strip the whitespace in that data.frame, write out a fresh po file, copy that into the weblate mirror somehow?
I'm not sure if we need R for this - couldn't we just sed -i
the ja.po
files to drop the leading and trailing spaces in the "msgstr" lines? That might be the least intrusive operation I think .. although we might face issues with multi-line strings. Thoughts?
I think it was only due to the extra trailing space, no?
Oh, great. Seems odd the check duplicates the logic, but glad we can keep this otherwise useful consistency check.
we might face issues with multi-line strings
Yes, exactly why I wanted heavier machinery. In principle we could use awk
for this. What I had in mind is a one-time fix, though, so the exact tooling is not so important. After that, the Weblate UI will show a warning, I hope that is enough, especially combined with checkPoTools()
being used.
Yeah, agreed it should be a one-time fix. And potools
seems like a perfect tool for this :)
Regarding how to get this into weblate, we have a few options:
But we need to be cautious with the fuzzy tags, and also to keep track of what is to be reviewed and what was approved. Maybe better to test in a playground/sandbox component first? :zipper_mouth_face:
Did the simple sed
approach here:
# leading ws
for FILE in $(grep -Elr '^msgstr " ' --include=*.po); do sed -Ei 's/^msgstr " /msgstr "/g' $FILE; done
# trailing ws
for FILE in $(grep -Elr '^msgstr.* "$' --include=*.po); do sed -Ei 's/^(msgstr.*) "$/\1"/g' $FILE; done
@daroczig PTAL.
We can address multiline issues later; looks like there's about 117 hits for the multiline case:
grep -Pzor '\n[^\n#]* "\n\n' --include=*.po
that looks great, thanks!! :bow:
Hmm, not sure I've done the right thing here to proceed.
git push weblate master
from my local copy with this commit applied; push
got rejected/var/lib/docker/volumes/weblate-docker_weblate-data/_data/vcs/r-project/base-r-gui
(needed root access first). I don't see any pending sync needed at https://translate.rx.studio/projects/r-project/#repository, and I still see some of the changed strings on Weblate with leading/trailing whitespace in the UI.app/data/vcs/r-project/base-r-gui
. Actually the commit from step 2 is already there.I also glanced at the approach you mentioned above (manually uploading the edited files through the UI) but there are way too many individual files for that to be sustainable.
Any idea how to proceed (maybe we just need to wait for my manual edit to get picked up by the UI?)
Oh! I see the changes on Weblate now :tada:
@daroczig you didn't do anything yet I'm assuming?
OK, I took a pass at handling the multi-line case manually now too. I think there are a few more types of case missed, but this gets us in a much better state / shouldn't be as much of nuisance as a translator anymore. Closing.
Thanks a ton to @reikookamoto for raising this:
It's annoying to fix all the strings manually in the UI, so let's look for a way to fix these stings in an automated/batch process.