sillsdev / l10nsharp

A .NET localization library for Windows Forms applications.
3 stars 10 forks source link

Add more fixup to CheckOrFixXliff #39

Closed StephenMcConnel closed 6 years ago

StephenMcConnel commented 6 years ago

Translators have made a few more error patterns


This change is Reviewable

JohnThomson commented 6 years ago

The changes are probably fine, but REs are very hard to read and be sure are right. In general, I would like to see a prose description in a comment of the problem each is supposed to solve. Is it important that they are done in a particular order? If so this would be good to comment. And dare I suggest unit tests?


Review status: 0 of 1 files reviewed at latest revision, 1 unresolved discussion.


src/L10NSharp/LocalizedStringCache.cs, line 821 at r1 (raw file):

          var target7 = Regex.Replace(target6, " \"{\"{([0-9]+)\\. ", " \u200E\"{$1}\"\u200F. ", RegexOptions.CultureInvariant);
          var target8 = Regex.Replace(target7, " \"{\"{([0-9]+) ", " \u200E\"{$1}\"\u200F ", RegexOptions.CultureInvariant);
          var target9 = Regex.Replace(target8, " ([0-9]+)}{{. ", " \u200E{$1}\u200F. ", RegexOptions.CultureInvariant);

Surprises me to see an unescaped dot with no parens to capture it in the input, and a corresponding literal dot in the output. Do we really want to turn ANY character in this context into a dot? Also somewhat surprised to see the double open-brace in the pattern side.


Comments from Reviewable

StephenMcConnel commented 6 years ago

There are unit tests, but maybe I should add to them for the new patterns. I've tried to add some comments, about the RE syntax in general, and briefly what each group of changes is trying to accomplish.


Review status: 0 of 2 files reviewed at latest revision, 2 unresolved discussions.


src/L10NSharp/LocalizedStringCache.cs, line 821 at r1 (raw file):

Previously, JohnThomson (John Thomson) wrote…
Surprises me to see an unescaped dot with no parens to capture it in the input, and a corresponding literal dot in the output. Do we really want to turn ANY character in this context into a dot? Also somewhat surprised to see the double open-brace in the pattern side.

Good catch on the unescaped dot. The double open-brace actually occurs in one of the strings from crowdin. I'm not sure what the translator was thinking. :-) :-(


Comments from Reviewable

JohnThomson commented 6 years ago

:lgtm: will let you merge when other things done


Review status: 0 of 2 files reviewed at latest revision, 1 unresolved discussion.


Comments from Reviewable