sillsdev / l10nsharp

A .NET localization library for Windows Forms applications.
3 stars 10 forks source link

Create ExtractXliff command line program #26

Closed StephenMcConnel closed 6 years ago

StephenMcConnel commented 6 years ago

This change is Reviewable

JohnThomson commented 6 years ago

Feels like I'm being way too critical, especially when this is basically a simple utility program that probably won't get much maintenance. Or maybe I just don't understand xliff well enough to be reviewing this. Feel free to push back on changes that don't seem worth the effort.


Review status: 0 of 10 files reviewed at latest revision, 14 unresolved discussions.


src/ExtractXliff/Program.cs, line 21 at r1 (raw file):

  {
      static List<string> _namespaces = new List<string>();
      static string _xliffFilename;

This is in fact I believe the OUTPUT xliff filename. Unfortunately it's inherent in the nature of what this program does that we have both a 'new' and an 'output' xliff file; the two are easily confused, I find, since 'new' strongly suggests an output. I think it would be helpful to have a class comment which clearly identifies the three xliff files involved in the process and a consistent naming convention for them which is followed throughout (including using a name here that makes it clear which of the three it is).


src/ExtractXliff/Program.cs, line 23 at r1 (raw file):

      static string _xliffFilename;
      static string _datatype;
      static string _original;

If I'm understanding right, _original is (a) the value of the -o argument; (b) the name of the 'new' .xlf file (c) the id and name of the LocalizationManager, and (d) an attribute value of the file element in the new .xlf file. It's not at all obvious why all of those should be the same, or why the variable for the third of the xliff files involved in the process isn't xliffFilename. I think at least a comment is indicated, especially since 'original' strongly suggests to me the OLD file (what you're calling base).


src/ExtractXliff/Program.cs, line 34 at r1 (raw file):

      const string kDefaultAmpersandReplacement = "|amp|";

      static void Main(string[] args)

This method is basically three steps: parse the options, check the options, do the work. Two of the steps are encapsulated in methods while one is spelled out here, making it seem as if the main point of the program is to check the options. It's not vital, but it would feel neater to make CheckOptions a method too.


src/ExtractXliff/Program.cs, line 53 at r1 (raw file):

              extractor.ExternalAssembliesToScan = assemblies.ToArray();
              IEnumerable<LocalizingInfo> localizedStrings = extractor.DoExtractingWork(_namespaces.ToArray(), null);
              var lm = new LocalizationManager(_original, _original, _productVersion);

This will presumably crash if they didn't provide the _orginal argument, but it would be more consistent to give a nice message with Usage, wouldn't it? Also it's not clear from Usage that original is used for more than just an attribute of an element in the output file


src/ExtractXliff/Program.cs, line 62 at r1 (raw file):

              {
                  baseDoc = XLiffDocument.Read(_baseXliffFilename);
                  if (baseDoc.File.SourceLang != newDoc.File.SourceLang && baseDoc.File.SourceLang != kDefaultLangId)

Is it really OK that, say, the base document maps English to French and the new one maps French to Spanish? (Not likely, of course...)


src/ExtractXliff/Program.cs, line 68 at r1 (raw file):

                      return;
                  }
                  if (baseDoc.File.Original != newDoc.File.Original && baseDoc.File.Original != _original)

I'm not clear what original is, but it is it really OK that they are both the same but different from what the output file will use?


src/ExtractXliff/Program.cs, line 73 at r1 (raw file):

                          baseDoc.File.Original, newDoc.File.Original);
                  }
                  if (baseDoc.File.DataType != newDoc.File.DataType && baseDoc.File.DataType != _datatype)

Is it really OK that they both have the same DataType which is different from the datatype that will be applied to the merged file?


src/ExtractXliff/Program.cs, line 78 at r1 (raw file):

                          baseDoc.File.DataType, newDoc.File.DataType);
                  }
                  if (baseDoc.File.ProductVersion != newDoc.File.ProductVersion && baseDoc.File.ProductVersion != _productVersion)

Again, it's OK if the old and new files agree on the product version, even though it's different from the one we will write to the merged file?


src/ExtractXliff/Program.cs, line 94 at r1 (raw file):

          if (!String.IsNullOrEmpty(_productVersion))
              xliffOutput.File.ProductVersion = _productVersion;
          xliffOutput.File.HardLineBreakReplacement = kDefaultNewlineReplacement;

Is it possible/a problem if one or both of the input files use some other replacement strings?


src/ExtractXliff/Program.cs, line 132 at r1 (raw file):

                      if (tuOld.Dynamic)
                      {
                          ++wrongDynamicFlagCount;

How do we know old dynamic was wrong? Are we assuming the new xliff is the output of a process that won't find dynamic strings?


src/ExtractXliff/Program.cs, line 255 at r1 (raw file):

          if (error)
              return false;
          if (args[i] == "-n" || "--namespace".StartsWith(args[i]))

Is it intentional that an argument that is simply -- or even just a single hyphen will be taken as a namespace argument? (Same with other methods, except they won't get the chance). Is it intentional to be case-sensitive?


src/ExtractXliff/Program.cs, line 369 at r1 (raw file):

          }
          return false;
      }

Seems nearly all these methods do much the same thing and could be a single method taking a short name, long name, and an Action that does something with the following argument.


src/L10NSharp/LocalizationManager.cs, line 244 at r1 (raw file):

          Name = appName;
          AppVersion = appVersion;
      }

It appears the only caller of this is the new call above which passes _original for the first two arguments. I'm not sure what Id and Name are used for, but it seems some comment might be called for here or where that call happens to explain why it is a good value for both. Or maybe in this special usage the Id and Name just don't matter much? Anyway some clarification would be nice.


src/L10NSharp/CodeReader/StringExtractor.cs, line 180 at r1 (raw file):

      /// ------------------------------------------------------------------------------------
      /// <summary>
      /// Gets a list of all assemblies referenced by the entry assembly

Seems some new commenting is needed here and/or on ExternalAssembliesToScan to make it clear that if that is used we don't scan the entry assembly or other loaded assemblies.


Comments from Reviewable

JohnThomson commented 6 years ago

Reviewed 10 of 10 files at r1. Review status: all files reviewed at latest revision, 14 unresolved discussions.


Comments from Reviewable

StephenMcConnel commented 6 years ago

Review status: all files reviewed at latest revision, 14 unresolved discussions.


src/ExtractXliff/Program.cs, line 23 at r1 (raw file):

Previously, JohnThomson (John Thomson) wrote…
If I'm understanding right, _original is (a) the value of the -o argument; (b) the name of the 'new' .xlf file (c) the id and name of the LocalizationManager, and (d) an attribute value of the file element in the new .xlf file. It's not at all obvious why all of those should be the same, or why the variable for the third of the xliff files involved in the process isn't xliffFilename. I think at least a comment is indicated, especially since 'original' strongly suggests to me the OLD file (what you're calling base).

_original is not the name of the "new" .xlf file. I've renamed the class variables, and put comments on each of them to try to clarify things. _original is used as the appId and appName of the localization manager and as the XLIFF file element attribute value. The use in the localization manager is perhaps pragmatic, but it fits in with the initialization calls in the Bloom code.


src/ExtractXliff/Program.cs, line 34 at r1 (raw file):

Previously, JohnThomson (John Thomson) wrote…
This method is basically three steps: parse the options, check the options, do the work. Two of the steps are encapsulated in methods while one is spelled out here, making it seem as if the main point of the program is to check the options. It's not vital, but it would feel neater to make CheckOptions a method too.

The program structure obviously isn't clear enough. I've add some comments and split off another method from Main. I've also added a class comment and method comments throughout.


src/ExtractXliff/Program.cs, line 53 at r1 (raw file):

Previously, JohnThomson (John Thomson) wrote…
This will presumably crash if they didn't provide the _orginal argument, but it would be more consistent to give a nice message with Usage, wouldn't it? Also it's not clear from Usage that original is used for more than just an attribute of an element in the output file

Providing the _original argument is enforced by ParseOptions. The program will display the Usage() message and exit if it isn't provided.


src/ExtractXliff/Program.cs, line 62 at r1 (raw file):

Previously, JohnThomson (John Thomson) wrote…
Is it really OK that, say, the base document maps English to French and the new one maps French to Spanish? (Not likely, of course...)

No, the check really is enforcing English as the only valid source-language attribute value.


src/ExtractXliff/Program.cs, line 68 at r1 (raw file):

Previously, JohnThomson (John Thomson) wrote…
I'm not clear what original is, but it is it really OK that they are both the same but different from what the output file will use?

I've simplified the checks to better mirror what will actually appear in the output file.


src/ExtractXliff/Program.cs, line 73 at r1 (raw file):

Previously, JohnThomson (John Thomson) wrote…
Is it really OK that they both have the same DataType which is different from the datatype that will be applied to the merged file?

I've simplified the checks to better mirror what will actually appear in the output file.


src/ExtractXliff/Program.cs, line 78 at r1 (raw file):

Previously, JohnThomson (John Thomson) wrote…
Again, it's OK if the old and new files agree on the product version, even though it's different from the one we will write to the merged file?

I've simplified the checks to better mirror what will actually appear in the output file.


src/ExtractXliff/Program.cs, line 94 at r1 (raw file):

Previously, JohnThomson (John Thomson) wrote…
Is it possible/a problem if one or both of the input files use some other replacement strings?

no. At the moment, there's no way to modify these replacement strings anywhere, and I'm not sure they're even being used in practice. They may be relics of the old TMX code, but I didn't want to change everything at once.


src/ExtractXliff/Program.cs, line 132 at r1 (raw file):

Previously, JohnThomson (John Thomson) wrote…
How do we know old dynamic was wrong? Are we assuming the new xliff is the output of a process that won't find dynamic strings?

Yes, the new values (from newDoc) are found by the static scan process implemented by L10NSharp. So nothing it finds can be "dynamic" (unless of course I have no clue what "dynamic" means in the L10NSharp domain).


src/ExtractXliff/Program.cs, line 255 at r1 (raw file):

Previously, JohnThomson (John Thomson) wrote…
Is it intentional that an argument that is simply -- or even just a single hyphen will be taken as a namespace argument? (Same with other methods, except they won't get the chance). Is it intentional to be case-sensitive?

It is intentional to be case-sensitve. If the user types garbage on the command line that passes the minimal syntax checking of the code, then the output is likely to be garbage and possibly empty. I'm not trying to prevent all errors, just ones that will crash the program immediately.


src/ExtractXliff/Program.cs, line 369 at r1 (raw file):

Previously, JohnThomson (John Thomson) wrote…
Seems nearly all these methods do much the same thing and could be a single method taking a short name, long name, and an Action that does something with the following argument.

done.


src/L10NSharp/LocalizationManager.cs, line 244 at r1 (raw file):

Previously, JohnThomson (John Thomson) wrote…
It appears the only caller of this is the new call above which passes _original for the first two arguments. I'm not sure what Id and Name are used for, but it seems some comment might be called for here or where that call happens to explain why it is a good value for both. Or maybe in this special usage the Id and Name just don't matter much? Anyway some clarification would be nice.

done.


src/L10NSharp/CodeReader/StringExtractor.cs, line 180 at r1 (raw file):

Previously, JohnThomson (John Thomson) wrote…
Seems some new commenting is needed here and/or on ExternalAssembliesToScan to make it clear that if that is used we don't scan the entry assembly or other loaded assemblies.

done.


Comments from Reviewable

JohnThomson commented 6 years ago

Much better. Two minor points still. Feel free to merge if I'm off track with them.


Review status: 6 of 10 files reviewed at latest revision, 7 unresolved discussions.


src/ExtractXliff/Program.cs, line 23 at r1 (raw file):

Previously, StephenMcConnel (Steve McConnel) wrote…
_original is not the name of the "new" .xlf file. I've renamed the class variables, and put comments on each of them to try to clarify things. _original is used as the appId and appName of the localization manager and as the XLIFF file element attribute value. The use in the localization manager is perhaps pragmatic, but it fits in with the initialization calls in the Bloom code.

My bad. Somewhere I got the idea that the new data came from an xliff file, but of course it's really from the C# assemblies.


src/ExtractXliff/Program.cs, line 34 at r1 (raw file):

Previously, StephenMcConnel (Steve McConnel) wrote…
The program structure obviously isn't clear enough. I've add some comments and split off another method from Main. I've also added a class comment and method comments throughout.

Makes much more sense now (especially once I got straight that there was only one input xliff file).


src/ExtractXliff/Program.cs, line 53 at r1 (raw file):

Previously, StephenMcConnel (Steve McConnel) wrote…
Providing the _original argument is enforced by ParseOptions. The program will display the Usage() message and exit if it isn't provided.

I see that now I look closely.


src/ExtractXliff/Program.cs, line 132 at r1 (raw file):

Previously, StephenMcConnel (Steve McConnel) wrote…
Yes, the new values (from newDoc) are found by the static scan process implemented by L10NSharp. So nothing it finds can be "dynamic" (unless of course I have no clue what "dynamic" means in the L10NSharp domain).

Of course, it's not an assumption, you do the process here! I must have been really tired that evening.


src/ExtractXliff/Program.cs, line 255 at r1 (raw file):

Previously, StephenMcConnel (Steve McConnel) wrote…
It is intentional to be case-sensitve. If the user types garbage on the command line that passes the minimal syntax checking of the code, then the output is likely to be garbage and possibly empty. I'm not trying to prevent all errors, just ones that will crash the program immediately.

I still don't think it's OK that "--", which has a specific purpose of terminating the arguments list, will instead be taken as matching any argument not already looked for (since e.g., "--namespace".startsWith("--") is true)


src/ExtractXliff/Program.cs, line 443 at r2 (raw file):

                          baseDoc.File.ProductVersion, _fileProductVersion);
                  }
              }

Seems to me it's hardly worth a warning if baseDoc's product version is different from newDoc and _fileProductVersion...that just means (hopefully) we're updating to a new version of the product. But if newDoc's version is different from _fileProductVersion, then the command line arguments are telling us to lie about what version of the program this xliff matches...THAT deserves a warning. (Actually I'm wondering why we even have a command line argument for this...when would we NOT want to write the correct product version determined from the assemblies we processed into the new xliff?)


Comments from Reviewable

StephenMcConnel commented 6 years ago

Review status: 6 of 10 files reviewed at latest revision, 7 unresolved discussions.


src/ExtractXliff/Program.cs, line 255 at r1 (raw file):

Previously, JohnThomson (John Thomson) wrote…
I still don't think it's OK that "--", which has a specific purpose of terminating the arguments list, will instead be taken as matching any argument not already looked for (since e.g., "--namespace".startsWith("--") is true)

I added a check for "--" in the two places where it might be checked.


src/ExtractXliff/Program.cs, line 443 at r2 (raw file):

Previously, JohnThomson (John Thomson) wrote…
Seems to me it's hardly worth a warning if baseDoc's product version is different from newDoc and _fileProductVersion...that just means (hopefully) we're updating to a new version of the product. But if newDoc's version is different from _fileProductVersion, then the command line arguments are telling us to lie about what version of the program this xliff matches...THAT deserves a warning. (Actually I'm wondering why we even have a command line argument for this...when would we NOT want to write the correct product version determined from the assemblies we processed into the new xliff?)

One reason for the file element attribute options is to preserve existing values so that files can be updated and show only real differences in strings to localize. Changing the product -version does have some affect in crowdin as I recall, so it's not clear we want to change it in the xliff even when the program changes it. Although ideally it would stay in sync with the program all else being equal.

And I'm inclined to leave the check in. Without the warning, a file diff could show up and a developer decide to change the checked-in xliff file without being aware of the ramifications on crowdin of changing some of these attribute values. (See DistFiles/localization/README.md for gory details.)


Comments from Reviewable

JohnThomson commented 6 years ago
:lgtm:

Reviewed 3 of 4 files at r2, 1 of 1 files at r3. Review status: all files reviewed at latest revision, 5 unresolved discussions.


Comments from Reviewable