sillsdev / l10nsharp

A .NET localization library for Windows Forms applications.
3 stars 10 forks source link

Get xliff branch working with Bloom and remove warnings #16

Closed StephenMcConnel closed 7 years ago

StephenMcConnel commented 7 years ago

This change is Reviewable

JohnThomson commented 7 years ago

Review status: 0 of 21 files reviewed at latest revision, 15 unresolved discussions.


src/L10NSharp/LocalizedStringCache.cs, line 246 at r1 (raw file):

          xliffDoc.File.HardLineBreakReplacement = s_literalNewline;
          if (OwningManager != null && OwningManager.Name != null)
              xliffDoc.File.Original = OwningManager.Name + ".dll";

Not sure how much control we have here, but it seems to me that an xliffDoc's unqualified File ought to be the XML file it lives in; we need a more suitable name for the DLL it provides translations for.


src/L10NSharp/TransUnitUpdater.cs, line 132 at r1 (raw file):

                  _updated = true;
                  tu.RemoveVariant(tuv);
                  if ((tu.Source == null || tu.Source.Value == null) &&

I think you can now use tu.Source?.Value == null


src/L10NSharp/CodeReader/StringExtractor.cs, line 89 at r1 (raw file):

                  //Debug.WriteLine(String.Format("DEBUG: StringExtractor.DoExtractingWork() loaded resources for {0}", type.FullName));
              }
              #pragma warning disable CS0168

May be worth commenting what warning you are disabling?


src/L10NSharp/XLiffUtils/TransUnit.cs, line 45 at r1 (raw file):

      /// ------------------------------------------------------------------------------------
      /// <summary>
      /// Gets the type of translation unit.

Comment wants update


src/L10NSharp/XLiffUtils/TransUnit.cs, line 53 at r1 (raw file):

      /// ------------------------------------------------------------------------------------
      /// <summary>
      /// Gets the source of the translation unit.

Saying this about a property called Source in a class called TransUnit doesn't add much. If it deserves a comment at all, it should indicate what sort of source (a file path? Name of the human translator? Name of a language? Text of what was translated?)


src/L10NSharp/XLiffUtils/XLiffDocument.cs, line 89 at r1 (raw file):

          if(isTargetNeeded)
              return File.Body.TransUnits.Select(tu => tu.Source).Where(s => (s != null && s.Lang != null)).Select(s => s.Lang)
                  .Union(File.Body.TransUnits.Select(tu => tu.Target).Where(t => (t != null && t.Lang != null)).Select(t => t.Lang)).Distinct();

This logic doesn't seem to correspond to "isTargetNeeded". I expected this version to select the languages where language and target are BOTH not-null, not where EITHER is not null. Maybe 'targetWanted' to suggest that we want targets as well as sources? At a higher level, I wonder why this is useful. Seems we're mainly interested in the target languages, and maybe sometimes might also care about the source...if both of those mean what I think (i.e., source is just the English, and target is the actual localization data this xliff is providing). Then there's the issue you discovered yesterday, where code is trying to enumerate languages in one xliff where it should be enumerating languages each represented by a separate xliff...this method may go away altogether? Feel free to just ignore improving this for this PR if it's all going away in the next :-)


src/L10NSharp/XLiffUtils/XLiffFile.cs, line 76 at r1 (raw file):

      /// ------------------------------------------------------------------------------------
      /// <summary>
      /// Gets or sets the source language found in the XLiff file.

we seem to be contrasting source and target, so why is the source language property called TargetLang?


src/L10NSharp/XLiffUtils/XLiffFile.cs, line 108 at r1 (raw file):

      /// ------------------------------------------------------------------------------------
      /// <summary>
      /// Gets or sets the hard linebreak replacement string.

Is there something more useful to say? For example, is this something that will occur in the output when there's a real hard line break in the file, or something that occurs in the file and will be changed to a hard line break in the strings returned to the client?


src/L10NSharp/XLiffUtils/XLiffNote.cs, line 88 at r1 (raw file):

        public bool IsEmpty
        {
            get { return (string.IsNullOrEmpty(NoteLang) && string.IsNullOrEmpty(Text)); }

I find it surprising that a Note with no text is non-empty, even if it has a language. (I could imagine that is it empty, or at least useless, if EITHER is missing.)


src/L10NSharp/XLiffUtils/XLiffXmlSerializationHelper.cs, line 117 at r1 (raw file):

              using (StringWriter writer = new StringWriter(output))
              {
                  XmlSerializerNamespaces nameSpace = new XmlSerializerNamespaces();

Should the name be plural?


src/L10NSharpTests/XLiffUtilsTests.cs, line 27 at r1 (raw file):


      [Test]
      public void TestGetAllVariantLanguagesFound()

Should this be enhanced to verify that it can find more than one of each? (Assuming we still need this method at all once you sort out the deeper logic errors :-)


src/xslt/l10nSharpTmxToXliff1-2.xsl, line 14 at r1 (raw file):

  <!-- set language code variables according to the TMX header attributes -->
  <xsl:variable name="srclang"><xsl:value-of select="/tmx/header/@adminlang"/></xsl:variable>
  <xsl:variable name="targlang"><xsl:value-of select="/tmx/header/@srclang"/></xsl:variable>

Aaargh! So what tmx calls 'source' is what xliff calls 'target', and xliff calls something else (tmx 'admin') source? No wonder comments are getting confused! Not sure what you can do, but try to be sure that every place a 'Source' property is declared it's clear whether we mean the tmx or xliff sense.


src/xslt/l10nSharpTmxToXliff1-2.xsl, line 30 at r1 (raw file):

  <xsl:template match="tmx">
      <xsl:element name="file">
          <xsl:attribute name="original">CHANGE-ME.dll</xsl:attribute>

Did you mean to change this before merging, or is code going to change this later?


src/xslt/l10nSharpTmxToXliff1-2.xsl, line 34 at r1 (raw file):

          <xsl:if test="$srclang != $targlang">
              <xsl:attribute name="target-language"><xsl:value-of select="$targlang"/></xsl:attribute>
          </xsl:if>

These lines seem inconsistent with how you got values for srclang and targlang above.


src/xslt/l10nSharpTmxToXliff1-2.xsl, line 84 at r1 (raw file):

      <xsl:if test="not(following-sibling::tuv)">
          <xsl:apply-templates select="preceding-sibling::note"/>
      </xsl:if>

Would it be simpler to just do xsl:apply-templates select ="note" following the tuv select at the parent level?


Comments from Reviewable

StephenMcConnel commented 7 years ago

Review status: 0 of 21 files reviewed at latest revision, 15 unresolved discussions.


src/L10NSharp/LocalizedStringCache.cs, line 246 at r1 (raw file):

Previously, JohnThomson (John Thomson) wrote…
Not sure how much control we have here, but it seems to me that an xliffDoc's unqualified File ought to be the XML file it lives in; we need a more suitable name for the DLL it provides translations for.

Quoting from the xliff 1.2 standard, "The original attribute specifies the name of the original file from which the contents of a element has been extracted." This value doesn't really exist in our setup since we're extracting strings from a large number of C# (or Typescript) files or assemblies. We could complicate things by having multiple file elements in the xliff, one per actual source file with original pointing to that exact source file. However, some strings are used in multiple places in the program with identical id values, so that breaks down also. Changing from ".dll" to ".xlf" doesn't really fit the intention of the standard any better than what is being done currently, and will likely look like a circular reference to the current xliff file itself. (NB, the xliff standard does allow multiple file elements inside the xliff element, although our implementation has only one.)

Summary: I'm not inclined to change this at the moment. We could discuss whether to implement multiple file elements with more meaningful values for the "original" attribute, but I'm not eager to go there if we don't have to.


src/L10NSharp/TransUnitUpdater.cs, line 132 at r1 (raw file):

Previously, JohnThomson (John Thomson) wrote…
I think you can now use tu.Source?.Value == null

That's nice to know, but I'm not convinced it's as clear for a casual reader of the code. At least not until the syntax has been used a lot longer. And there's hundreds of places in our code that could changed in this way...


src/L10NSharp/CodeReader/StringExtractor.cs, line 89 at r1 (raw file):

Previously, JohnThomson (John Thomson) wrote…
May be worth commenting what warning you are disabling?

Done. for all of the #pragmas that I added.


src/L10NSharp/XLiffUtils/TransUnit.cs, line 45 at r1 (raw file):

Previously, JohnThomson (John Thomson) wrote…
Comment wants update

Done.


src/L10NSharp/XLiffUtils/TransUnit.cs, line 53 at r1 (raw file):

Previously, JohnThomson (John Thomson) wrote…
Saying this about a property called Source in a class called TransUnit doesn't add much. If it deserves a comment at all, it should indicate what sort of source (a file path? Name of the human translator? Name of a language? Text of what was translated?)

Done.


src/L10NSharp/XLiffUtils/XLiffDocument.cs, line 89 at r1 (raw file):

Previously, JohnThomson (John Thomson) wrote…
This logic doesn't seem to correspond to "isTargetNeeded". I expected this version to select the languages where language and target are BOTH not-null, not where EITHER is not null. Maybe 'targetWanted' to suggest that we want targets as well as sources? At a higher level, I wonder why this is useful. Seems we're mainly interested in the target languages, and maybe sometimes might also care about the source...if both of those mean what I think (i.e., source is just the English, and target is the actual localization data this xliff is providing). Then there's the issue you discovered yesterday, where code is trying to enumerate languages in one xliff where it should be enumerating languages each represented by a separate xliff...this method may go away altogether? Feel free to just ignore improving this for this PR if it's all going away in the next :-)

Although I got this method working more or less without crashing in Bloom, I'm not at all convinced it serves a useful purpose. The isTargetNeeded flag indicates that both source and target elements are to be checked. It makes more sense when multiple target files have been merged and some have actually contributed translation units. But I'm not sure that the merging operation ever makes sense. I'd rather leave any changes to this to the next PR since it might well be removed.


src/L10NSharp/XLiffUtils/XLiffFile.cs, line 76 at r1 (raw file):

Previously, JohnThomson (John Thomson) wrote…
we seem to be contrasting source and target, so why is the source language property called TargetLang?

Done. comment should have said target instead of source.


src/L10NSharp/XLiffUtils/XLiffFile.cs, line 108 at r1 (raw file):

Previously, JohnThomson (John Thomson) wrote…
Is there something more useful to say? For example, is this something that will occur in the output when there's a real hard line break in the file, or something that occurs in the file and will be changed to a hard line break in the strings returned to the client?

I added a bit more to the comment to try to explain what this is used for.


src/L10NSharp/XLiffUtils/XLiffNote.cs, line 88 at r1 (raw file):

Previously, JohnThomson (John Thomson) wrote…
I find it surprising that a Note with no text is non-empty, even if it has a language. (I could imagine that is it empty, or at least useless, if EITHER is missing.)

changed to check only the text. The NoteLang value is often empty, assumed to be either English or whatever the default source language is.


src/L10NSharp/XLiffUtils/XLiffXmlSerializationHelper.cs, line 117 at r1 (raw file):

Previously, JohnThomson (John Thomson) wrote…
Should the name be plural?

Done.


src/L10NSharpTests/XLiffUtilsTests.cs, line 27 at r1 (raw file):

Previously, JohnThomson (John Thomson) wrote…
Should this be enhanced to verify that it can find more than one of each? (Assuming we still need this method at all once you sort out the deeper logic errors :-)

The final block does test for 2 languages being found. With the method being passed false, I would hope that only 1 language would ever be found. So I think the test is good enough for now, especially with the future of the method being somewhat in doubt. :-)


src/xslt/l10nSharpTmxToXliff1-2.xsl, line 14 at r1 (raw file):

Previously, JohnThomson (John Thomson) wrote…
Aaargh! So what tmx calls 'source' is what xliff calls 'target', and xliff calls something else (tmx 'admin') source? No wonder comments are getting confused! Not sure what you can do, but try to be sure that every place a 'Source' property is declared it's clear whether we mean the tmx or xliff sense.

I'm not sure whether what we were doing in TMX made any sense, but it seemed to be consistent in all of the files.


src/xslt/l10nSharpTmxToXliff1-2.xsl, line 30 at r1 (raw file):

Previously, JohnThomson (John Thomson) wrote…
Did you mean to change this before merging, or is code going to change this later?

There's no way to get a reasonable value inside the file, so I made it something that would attract attention. :-) I think it will have to be manually edited in the xliff files created from tmx files. There isn't any value in the tmx files that seems suitable to put in this attribute. And the name of the input tmx file isn't available inside the xslt processing as far as I could determine. Which might make a plausible (even if technically wrong) value.


src/xslt/l10nSharpTmxToXliff1-2.xsl, line 34 at r1 (raw file):

Previously, JohnThomson (John Thomson) wrote…
These lines seem inconsistent with how you got values for srclang and targlang above.

why? what xliff calls the source-language appears to marked adminlang in our tmx files, and the target-language in xliff terms is marked srclang in our tmx files. In passing, the xml:lang attributes of all source elements is optional because it's supposed to match the file's source-language attribute in xliff. Similarly, the xml:lang attribute of all target elements is optional and supposed to match the file's target-language attribute (especially if you follow the recommendation to have only bilingual xliff files).


src/xslt/l10nSharpTmxToXliff1-2.xsl, line 84 at r1 (raw file):

Previously, JohnThomson (John Thomson) wrote…
Would it be simpler to just do xsl:apply-templates select ="note" following the tuv select at the parent level?

good point. :-)


Comments from Reviewable

JohnThomson commented 7 years ago

Looks much better. Few more minor suggestions.


Review status: 0 of 21 files reviewed at latest revision, 11 unresolved discussions.


src/L10NSharp/LocalizedStringCache.cs, line 246 at r1 (raw file):

Previously, StephenMcConnel (Steve McConnel) wrote…
Quoting from the xliff 1.2 standard, "The original attribute specifies the name of the original file from which the contents of a element has been extracted." This value doesn't really exist in our setup since we're extracting strings from a large number of C# (or Typescript) files or assemblies. We could complicate things by having multiple file elements in the xliff, one per actual source file with original pointing to that exact source file. However, some strings are used in multiple places in the program with identical id values, so that breaks down also. Changing from ".dll" to ".xlf" doesn't really fit the intention of the standard any better than what is being done currently, and will likely look like a circular reference to the current xliff file itself. (NB, the xliff standard does allow multiple file elements inside the xliff element, although our implementation has only one.) Summary: I'm not inclined to change this at the moment. We could discuss whether to implement multiple file elements with more meaningful values for the "original" attribute, but I'm not eager to go there if we don't have to.

Not being sure what this value was even supposed to mean, I was not sure what if anything should be changed. Now that I know, I think using the name of the DLL for which it holds translations is a reasonable VALUE. I would not have called such a thing File.Original, but it sounds as though that name is part of the standard, so I agree, let's keep it.


src/L10NSharp/UI/CustomDropDownComboBox.cs, line 16 at r2 (raw file):


      /// ------------------------------------------------------------------------------------
      #pragma warning disable CS0649  // field declaration is never assigned a value

So why have it at all?


src/L10NSharp/UI/L10NExtender.cs, line 37 at r2 (raw file):


      #pragma warning disable CS0414  // field is assigned a value but its value is never used
      private Container components = null;

Likewise


src/L10NSharp/XLiffUtils/TransUnit.cs, line 45 at r2 (raw file):

      /// ------------------------------------------------------------------------------------
      /// <summary>
      /// Gets whether the unit is "dynamic" (discovered dynamically).

Better, but "discovered dynamically" still assumes a lot of knowledge. I always have trouble remembering whether dynamic are the ones discovered by scanning the files or (I think) the ones that get missed by that and only found while running the program. Maybe that's explained in depth somewhere else...could be cross-ref? Or just leave it...maybe I'm overdoing things.


src/L10NSharp/XLiffUtils/XLiffDocument.cs, line 89 at r1 (raw file):

Previously, StephenMcConnel (Steve McConnel) wrote…
Although I got this method working more or less without crashing in Bloom, I'm not at all convinced it serves a useful purpose. The isTargetNeeded flag indicates that both source and target elements are to be checked. It makes more sense when multiple target files have been merged and some have actually contributed translation units. But I'm not sure that the merging operation ever makes sense. I'd rather leave any changes to this to the next PR since it might well be removed.

Fair enough.


src/L10NSharp/XLiffUtils/XLiffFile.cs, line 108 at r1 (raw file):

Previously, StephenMcConnel (Steve McConnel) wrote…
I added a bit more to the comment to try to explain what this is used for.

Very helpful. That meaning had not even occurred to me as a possibility!


src/L10NSharp/XLiffUtils/XLiffXmlSerializationHelper.cs, line 127 at r2 (raw file):

              return (output.Length == 0 ? null : output.ToString());
          }
          #pragma warning disable CS0168, CS0162  // variable is declared but not used, unreachable code detected 

As things stand, we could remove the whole try/catch wrapper, unless possibly we often want to put a breakpoint on the throw.


src/L10NSharp/XLiffUtils/XLiffXmlSerializationHelper.cs, line 158 at r2 (raw file):

              }
          }
          #pragma warning disable CS0168, CS0162  // variable is declared but not used, unreachable code detected

and again


src/xslt/l10nSharpTmxToXliff1-2.xsl, line 14 at r1 (raw file):

Previously, StephenMcConnel (Steve McConnel) wrote…
I'm not sure whether what we were doing in TMX made any sense, but it seemed to be consistent in all of the files.

I'm just concerned about how confusing it will be if all our method and property names are written as if 'source' means the translation, once xliff becomes the main file format and in it 'source' is the language translated FROM (which actually makes better sense to me). Seems to call for at least comments where we declare properties called Source and especially where we load a target attribute into a property called source. But the main thing is that in all our C# etc Source consistently means the translation.


src/xslt/l10nSharpTmxToXliff1-2.xsl, line 30 at r1 (raw file):

Previously, StephenMcConnel (Steve McConnel) wrote…
There's no way to get a reasonable value inside the file, so I made it something that would attract attention. :-) I think it will have to be manually edited in the xliff files created from tmx files. There isn't any value in the tmx files that seems suitable to put in this attribute. And the name of the input tmx file isn't available inside the xslt processing as far as I could determine. Which might make a plausible (even if technically wrong) value.

xslt 2 has functions base-uri and document-uri which I think should retrieve the tmx file name if applied to a node from that file. Try . May need to strip off unwanted path elements.


src/xslt/l10nSharpTmxToXliff1-2.xsl, line 34 at r1 (raw file):

Previously, StephenMcConnel (Steve McConnel) wrote…
why? what xliff calls the source-language appears to marked adminlang in our tmx files, and the target-language in xliff terms is marked srclang in our tmx files. In passing, the xml:lang attributes of all source elements is optional because it's supposed to match the file's source-language attribute in xliff. Similarly, the xml:lang attribute of all target elements is optional and supposed to match the file's target-language attribute (especially if you follow the recommendation to have only bilingual xliff files).

You're right, not sure what I was thinking.


Comments from Reviewable

JohnThomson commented 7 years ago

Reviewed 12 of 21 files at r1, 9 of 9 files at r2. Review status: all files reviewed at latest revision, 11 unresolved discussions.


Comments from Reviewable

StephenMcConnel commented 7 years ago

Review status: 15 of 22 files reviewed at latest revision, 11 unresolved discussions.


src/L10NSharp/UI/CustomDropDownComboBox.cs, line 16 at r2 (raw file):

Previously, JohnThomson (John Thomson) wrote…
So why have it at all?

Okay, I've removed this even though it had a ripple effect that removed two methods and simplified a property definition greatly. I've removed all the item requiring a #pragma that appeared to be reasonable to remove.


src/L10NSharp/UI/L10NExtender.cs, line 37 at r2 (raw file):

Previously, JohnThomson (John Thomson) wrote…
Likewise

This one is needed for Designer support apparently. I copied the comment from where it's given a value.


src/L10NSharp/XLiffUtils/TransUnit.cs, line 45 at r2 (raw file):

Previously, JohnThomson (John Thomson) wrote…
Better, but "discovered dynamically" still assumes a lot of knowledge. I always have trouble remembering whether dynamic are the ones discovered by scanning the files or (I think) the ones that get missed by that and only found while running the program. Maybe that's explained in depth somewhere else...could be cross-ref? Or just leave it...maybe I'm overdoing things.

added "while the program is running" to the comment


src/L10NSharp/XLiffUtils/XLiffXmlSerializationHelper.cs, line 127 at r2 (raw file):

Previously, JohnThomson (John Thomson) wrote…
As things stand, we could remove the whole try/catch wrapper, unless possibly we often want to put a breakpoint on the throw.

Done.


src/L10NSharp/XLiffUtils/XLiffXmlSerializationHelper.cs, line 158 at r2 (raw file):

Previously, JohnThomson (John Thomson) wrote…
and again

Done.


src/xslt/l10nSharpTmxToXliff1-2.xsl, line 30 at r1 (raw file):

Previously, JohnThomson (John Thomson) wrote…
xslt 2 has functions base-uri and document-uri which I think should retrieve the tmx file name if applied to a node from that file. Try . May need to strip off unwanted path elements.

Unfortunately, the xslt processor I'm using only handles version 1.0 XSL.


Comments from Reviewable

StephenMcConnel commented 7 years ago

Review status: 15 of 22 files reviewed at latest revision, 11 unresolved discussions.


src/xslt/l10nSharpTmxToXliff1-2.xsl, line 14 at r1 (raw file):

Previously, JohnThomson (John Thomson) wrote…
I'm just concerned about how confusing it will be if all our method and property names are written as if 'source' means the translation, once xliff becomes the main file format and in it 'source' is the language translated FROM (which actually makes better sense to me). Seems to call for at least comments where we declare properties called Source and especially where we load a target attribute into a property called source. But the main thing is that in all our C# etc Source consistently means the translation.

I'm not sure I want to fully address this concern in this PR. But it is something to think about as I continue on work on the code.


Comments from Reviewable

JohnThomson commented 7 years ago
:lgtm:

Reviewed 7 of 7 files at r3. Review status: all files reviewed at latest revision, all discussions resolved.


Comments from Reviewable

JohnThomson commented 7 years ago

I don't seem to have permission to merge this. @hatton?

hatton commented 7 years ago

@JohnThomson please try now.