r-devel / translations

subsite for translations
https://contributor.r-project.org/translations/
Creative Commons Attribution 4.0 International
1 stars 3 forks source link

Run tools::checkPoFile on translations as a custom check #8

Open daroczig opened 1 year ago

daroczig commented 1 year ago

From Michael Lawrence:

@mmaechler brought up that it would be helpful to check translations as they are committed.

We could implement a hook to do so before committing to weblate's DB: https://docs.weblate.org/en/latest/admin/checks.html#writing-own-checks

We will need to either trigger tools::checkPoFile from a python helper using rpy2 or might need to set up an API instead

daroczig commented 1 year ago

Looking at the docs and source of tools::checkPoFile, Weblate is already doing most of these checks automatically (e.g. for the c-format flagged stings), but the R-specific checks are not implemented. Problem is that the strings extracted from R are not flagged, but we can solve for that by enforcing it at the component level.

Easiest might be to rewrite the R-related checks (it has not been updated for ~10 yrs and hopefully involves only a few lines to make sure that the same number of conversions are provided) in Python and include as a custom check.

daroczig commented 1 year ago
  1. Write Python module (example) and place it in /var/lib/docker/volumes/weblate-docker_weblate-data/_data/python/ on the host machine (mounted as /app/data/python/ within the Docker container) as per the "Creating a Python module" docs
  2. Add test to the CHECK_LIST (as per Customize Weblate docs) - either as env var in the Docker compose override (to be tested) or via settings.py
  3. Enforce custom tag for R components (vs e.g. the C components)
  4. test
mmaechler commented 12 months ago

Actually, I did mention that last time the *.po files that were "sent to R-core" to be added to the sources where partly deficient. So I don't think you need an action for every commit. But rather provide a "button" to do specific or "all" checks which definitely needs to be run before you extract the *.po files to be "sent over".

You may still run the checks periodically (once per week / month) if you'd want and have a public landing page to see the results of the last checks.

MichaelChirico commented 6 months ago

As Martin points out, at a minimum we need to do this as part of the wrap-up process for submitting patches to R-core.

OTOH, the fixes might sometimes require language-specific context that us translation maintainers lack to correct, so surfacing things directly to translators at translation-time would be preferable. The Arabic issues below are a good example.

For some context, here are the issues that checkPoFiles("*") (side note: https://bugs.r-project.org/show_bug.cgi?id=18643) finds when generating the patch as of today:

./src/library/base/po/ar.po:1528
too few entries, translation contains arabic percent sign U+066A
invalid value of %s
قيمة غير صالحة ل ٪س

./src/library/base/po/pt_BR.po:1835
src/main/coerce.c:2529 src/main/coerce.c:2590 src/main/coerce.c:2666
too many entries
default method not implemented for type '%s'
tipo não implementado '%s' em '%s'

./src/library/base/po/R-ar.po:1528
too few entries, translation contains arabic percent sign U+066A
invalid value of %s
قيمة غير صالحة ل ٪س

./src/library/methods/po/R-ar.po:18
too few entries, translation contains arabic percent sign U+066A
OOPS: something wrong with '.OldClassesPrototypes[[%d]]'
عذرا: شيء خاطئ في 'OldClassesPrototypes[[٪d]].'

./src/library/methods/po/R-zh_CN.po:200
too few entries
bad class specified for element %d (should be a single character string)bad class specified for element %d (should be a single character string)
为單元%d所设定的类别不正确(应当是条单字符字串)

./src/library/parallel/po/R-ar.po:18
too few entries, translation contains arabic percent sign U+066A
invalid value of %s
قيمة غير صالحة ل ٪س

./src/library/parallel/po/R-ja.po:47
differences in entries 1,2
node of a socket cluster on host %s with pid %d
PID %d を持つホスト %s 上のソケットクラスターのノード

./src/library/parallel/po/R-ja.po:117
differences in entries 1,2
socket cluster with %d nodes on host %s
ホスト %s 上に %d 個のノードを持つソケットクラスター

./src/library/stats/po/R-ar.po:477
too few entries, translation contains arabic percent sign U+066A
invalid value of %s
قيمة غير صالحة ل ٪س
MichaelChirico commented 5 months ago

@daroczig I don't quite understand where we'd put the .py file that implements the related checks, any idea?

e.g. we can very easily port over the checks for "bad" percent signs:

https://github.com/r-devel/r-svn/blob/26a1f60d38238e29efc5d6480924a21376587dee/src/library/tools/R/xgettext.R#L305-L310

I've noticed the Arabic translations tend to use ٪, it's probably not very natural to switch keyboards during translation. The check is very simple, so it's really just a matter of wrapping this up into a .py file in the right place / editing the CHECK_LIST as instructed:

from django.utils.translation import gettext_lazy
from weblate.checks.base import TargetCheck

class ArabPercentCheck(TargetCheck):
    check_id = "arabic-percent-check"
    name = gettext_lazy("Arabic % check")

    description = gettext_lazy("Use % instead of the Arabic ٪")

    # Real check code
    def check_single(self, source, target, unit):
        return "٪" in target

(and similar for the other two)


As to the other checks checkPoFile() performs, it may be a bit tricker since it relies on parsing out the printf() templates & matching them -- it would be great to steal R's logic here but it's all defined within the body of checkPoFile(), hard to modularize.

MichaelChirico commented 5 months ago

I don't quite understand where we'd put the .py file that implements the related checks, any idea?

I should have read your comment#3 again, sorry :)

I've had a go at implementing this, here's /var/lib/docker/volumes/weblate-docker_weblate-data/_data/python now:

ls -R
.:
custom_checks  customize

./custom_checks:
check_po_files  setup.py

./custom_checks/check_po_files:
__init__.py  percent.py

./customize:
__init__.py  models.py  __pycache__

./customize/__pycache__:
__init__.cpython-310.pyc  __init__.cpython-311.pyc  models.cpython-310.pyc  models.cpython-311.pyc

custom_checks being what I added, percent.py containing the classes outlined above.

I'm still not sure how to make this get passed into the weblate docker (will it happen automatically?), and more importantly how to register it in CHECK_LIST. I see a CHECK_LIST defined in a few places but they're all in the backups folder which doesn't immediately sound correct:

backups/settings-expanded.py:CHECK_LIST = ['weblate.checks.same.SameCheck', 'weblate.checks.chars.BeginNewlineCheck', 'weblate.checks.chars.EndNewlineCheck', 'weblate.checks.chars.BeginSpaceCheck', 'weblate.checks.chars.EndSpaceCheck', 'weblate.checks.chars.DoubleSpaceCheck', 'weblate.checks.chars.EndStopCheck', 'weblate.checks.chars.EndColonCheck', 'weblate.checks.chars.EndQuestionCheck', 'weblate.checks.chars.EndExclamationCheck', 'weblate.checks.chars.EndEllipsisCheck', 'weblate.checks.chars.EndSemicolonCheck', 'weblate.checks.chars.MaxLengthCheck', 'weblate.checks.chars.KashidaCheck', 'weblate.checks.chars.PunctuationSpacingCheck', 'weblate.checks.format.PythonFormatCheck', 'weblate.checks.format.PythonBraceFormatCheck', 'weblate.checks.format.PHPFormatCheck', 'weblate.checks.format.CFormatCheck', 'weblate.checks.format.PerlFormatCheck', 'weblate.checks.format.JavaScriptFormatCheck', 'weblate.checks.format.LuaFormatCheck', 'weblate.checks.format.ObjectPascalFormatCheck', 'weblate.checks.format.SchemeFormatCheck', 'weblate.checks.format.CSharpFormatCheck', 'weblate.checks.format.JavaFormatCheck', 'weblate.checks.format.JavaMessageFormatCheck', 'weblate.checks.format.PercentPlaceholdersCheck', 'weblate.checks.format.VueFormattingCheck', 'weblate.checks.format.I18NextInterpolationCheck', 'weblate.checks.format.ESTemplateLiteralsCheck', 'weblate.checks.angularjs.AngularJSInterpolationCheck', 'weblate.checks.icu.ICUMessageFormatCheck', 'weblate.checks.icu.ICUSourceCheck', 'weblate.checks.qt.QtFormatCheck', 'weblate.checks.qt.QtPluralCheck', 'weblate.checks.ruby.RubyFormatCheck', 'weblate.checks.consistency.PluralsCheck', 'weblate.checks.consistency.SamePluralsCheck', 'weblate.checks.consistency.ConsistencyCheck', 'weblate.checks.consistency.TranslatedCheck', 'weblate.checks.chars.EscapedNewlineCountingCheck', 'weblate.checks.chars.NewLineCountCheck', 'weblate.checks.markup.BBCodeCheck', 'weblate.checks.chars.ZeroWidthSpaceCheck', 'weblate.checks.render.MaxSizeCheck', 'weblate.checks.markup.XMLValidityCheck', 'weblate.checks.markup.XMLTagsCheck', 'weblate.checks.markup.MarkdownRefLinkCheck', 'weblate.checks.markup.MarkdownLinkCheck', 'weblate.checks.markup.MarkdownSyntaxCheck', 'weblate.checks.markup.URLCheck', 'weblate.checks.markup.SafeHTMLCheck', 'weblate.checks.placeholders.PlaceholderCheck', 'weblate.checks.placeholders.RegexCheck', 'weblate.checks.duplicate.DuplicateCheck', 'weblate.checks.source.OptionalPluralCheck', 'weblate.checks.source.EllipsisCheck', 'weblate.checks.source.MultipleFailingCheck', 'weblate.checks.source.LongUntranslatedCheck', 'weblate.checks.format.MultipleUnnamedFormatsCheck', 'weblate.checks.glossary.GlossaryCheck']  ###
backups/settings.py:CHECK_LIST = [
backups/settings.py:modify_env_list(CHECK_LIST, "CHECK")