iLib-js / ilib-loctool-ghfm

Ilib loctool plugin to parse and localize github-flavored markdown
Apache License 2.0
0 stars 0 forks source link

localising the title in frontmatter #28

Open peterfarrant opened 3 years ago

peterfarrant commented 3 years ago

I may be missing an obvious solution here but can't find anything in release notes or in general web search about frontmatter translation.

I have a few cases where the title: in my frontmatter needs localising as this is what appears in the menu (with Hugo) - when I move from English (British and American) to French as well this will be much worse so I think I am missing something obvious so apologies if that is the case.

Sample page:

title:  Summarized & Aggregated Data
weight: 30
---
# Summarised & Aggregated Data

This section reviews how call data is aggregated/summarised when viewing reports based on segmented/non-segmented data.

So in British English it should be (s not z)

title: Summarised & Aggregated Data

But the frontmatter items (normally correctly) do not appear in the xliff for translation Is there a way to enable this ?

ehoogerbeets commented 3 years ago

No way to enable it currently, but we can add it

peterfarrant commented 3 years ago

Yes please - one item is an inconvenience currently - once we go to do French as well it will be horrible

As far as I can see it is only the 'title' item that needs including, however it is possible to put custom variables in frontmatter (I don't). Because of this would it be best to have an option in the project.json that is a list of frontmatter variables that need including on the translation, I could add title as one and if anyone else used customer variable that needed this it could be added to the list as well

ehoogerbeets commented 3 years ago

That was my idea as well -- put the configuration into the project.json.

I was also in the process of adding mapping support like in a few of the other plugins so far (like json or po files). The frontmatter to translate would be part of the configuration in each mapping:

{
  "settings": {
    "markdown": {
      "mappings": {
        "**/foobar.md": {
          "template": "[dir]/[base]_[locale].[extension]",
          "frontmatter": ["Title", "Description"]
        }
      }
    }
  }
}

That way, different files in your project can have different settings.

ehoogerbeets commented 3 years ago

https://github.com/iLib-js/ilib-loctool-ghfm/pull/32

ehoogerbeets commented 3 years ago

Okay, the above was implemented as v1.8.0. Please check it out and see if it works for you.

ehoogerbeets commented 3 years ago

Whoops, didn't mean to close the issue until you've had a chance to check out.

peterfarrant commented 3 years ago

I'm sure it is me - but I cannot get it to work If I take my existing setup and project.json file and switch from 1.7.2 to using 1.8.0 I get the following errors just trying to run without any new settings changes - I would expect the new settings to be optional and it work without them without the new functionality (switch back to 1.7.2 or 1.7.1 and it works fine)

09:20:27 ERROR loctool.loctool: caught exception: TypeError [ERR_INVALID_ARG_TYPE]: The "path" argument must be of type string. Received undefined
09:20:27 ERROR loctool.loctool: TypeError [ERR_INVALID_ARG_TYPE]: The "path" argument must be of type string. Received undefined
    at validateString (internal/validators.js:124:11)
    at Object.join (path.js:375:7)
    at CustomProject.loadPlugin (C:\Code\Documentation\loctool\lib\CustomProject.js:119:32)
    at CustomProject.<anonymous> (C:\Code\Documentation\loctool\lib\CustomProject.js:197:27)
    at Array.forEach (<anonymous>)
    at CustomProject.defineFileTypes (C:\Code\Documentation\loctool\lib\CustomProject.js:196:18)
    at CustomProject.Project.init (C:\Code\Documentation\loctool\lib\Project.js:211:10)
    at processNextProject (C:\Code\Documentation\loctool\loctool.js:393:17)
    at Object.<anonymous> (C:\Code\Documentation\loctool\loctool.js:566:9)
    at Module._compile (internal/modules/cjs/loader.js:1063:30)
09:20:27 INFO loctool.loctool: Done

I am then unsure where to put the new settings

I tried the following in the project.json

{
    "name": "enterprise",
    "id": "enterprise",
    "sourceLocale": "en-US",
    "pseudoLocale": "qps-ploc",
    "resourceDirs": {
        "md": "target"
    },
    "includes": ["content/en-us"],
    "excludes": [
        ".git",
        ".github",
        "**.*",
        "*"
    ],
    "settings": {
      "markdown": {
        "mappings": {
            "**/foobar.md": {
            "template": "[dir]/[base]_[locale].[extension]",
            "frontmatter": ["Title", "Description"]
            }
        }
      },
        "locales": [
            "en-GB" 
        ],
        "targetDir": "output",
        "xliffsDir": "xliffs",
        "xliffsOut": "xliffs"
    },
    "projectType": "custom",
    "plugins": [
        "ghfm"
    ]
}
ehoogerbeets commented 3 years ago

What OS are you working on?

ehoogerbeets commented 3 years ago

BTW - your project.json looks correct

peterfarrant commented 3 years ago

Windows 10 - latest

ehoogerbeets commented 3 years ago

I suspect Windows node has different environment variables than mac and linux. If you go into the node command-line and you enter process.env.PWD what do you see? (I have a mac and an ubuntu linux machine that I develop with, so I can't try it myself.)

ehoogerbeets commented 3 years ago

I think I will try switching to process.cwd() which should work on all platforms...

peterfarrant commented 3 years ago
C:\Code\Documentation\enterprise-portal>node
Welcome to Node.js v14.16.0.
Type ".help" for more information.
> process.env.PWD
undefined
>
peterfarrant commented 3 years ago
> process.cwd()
'C:\\Code\\Documentation\\enterprise-portal'
>
ehoogerbeets commented 3 years ago

As I suspected! I just pushed a new branch fixPWD of the loctool project with the process.cwd() fix in it. Are you able to check that one out? I'll have to investigate further tomorrow as it is past 2am here in California!

peterfarrant commented 3 years ago

Just made the two code changes in CustomProject.js to quickly test from the branch I will have a look at my path stuff and see if I am causing it

10:20:31 ERROR loctool.loctool: caught exception: Error: Could not load plugin ghfm
10:20:31 ERROR loctool.loctool: Error: Could not load plugin ghfm
    at CustomProject.loadPlugin (C:\Code\Documentation\loctool\lib\CustomProject.js:124:21)
    at CustomProject.<anonymous> (C:\Code\Documentation\loctool\lib\CustomProject.js:197:27)
    at Array.forEach (<anonymous>)
    at CustomProject.defineFileTypes (C:\Code\Documentation\loctool\lib\CustomProject.js:196:18)
    at CustomProject.Project.init (C:\Code\Documentation\loctool\lib\Project.js:211:10)
    at processNextProject (C:\Code\Documentation\loctool\loctool.js:393:17)
    at Object.<anonymous> (C:\Code\Documentation\loctool\loctool.js:566:9)
    at Module._compile (internal/modules/cjs/loader.js:1063:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1092:10)
    at Module.load (internal/modules/cjs/loader.js:928:32)
10:20:31 INFO loctool.loctool: Done
peterfarrant commented 3 years ago

Sorry that was my fault really, it does work with that change to process.cwd() I had not got a copy of ilib-loctool-yaml which is now required by ghfm - so not failing to find ghfm - but the required plugin of it Now running it I get an issue with the new functionality

If I remove this from the project.json it works

      "markdown": {
        "mappings": {
            "**/*.md": {
            "template": "[dir]/[base]_[locale].[extension]",
            "frontmatter": ["title"]
            }
        }
      },

put it in I get the following errors

12:00:34 ERROR loctool.lib.Project: Error while extracting from file content\en-us\users\usersettings.md. Skipping...
12:00:34 ERROR loctool.lib.Project: TypeError: Cannot read property 'regex' of undefined
    at Object.module.exports.getLocaleFromPath (C:\Code\Documentation\loctool\lib\utils.js:2069:50)
    at MarkdownFileType.handles (C:\Code\node_modules\ilib-loctool-ghfm\MarkdownFileType.js:148:53)
    at CustomProject.<anonymous> (C:\Code\Documentation\loctool\lib\Project.js:472:38)
    at LocalRepository.getBy (C:\Code\Documentation\loctool\lib\LocalRepository.js:130:5)
    at CustomProject.Project.extract (C:\Code\Documentation\loctool\lib\Project.js:453:13)
    at CustomProject.extract (C:\Code\Documentation\loctool\lib\CustomProject.js:232:35)
    at C:\Code\Documentation\loctool\loctool.js:394:21
    at CustomProject.<anonymous> (C:\Code\Documentation\loctool\lib\Project.js:255:17)
    at LocalRepository.getLocales (C:\Code\Documentation\loctool\lib\LocalRepository.js:219:5)
    at CustomProject.<anonymous> (C:\Code\Documentation\loctool\lib\Project.js:242:21)
12:00:34 ERROR loctool.lib.Project: Error while extracting from file content\en-us\users\_index.md. Skipping...
12:00:34 ERROR loctool.lib.Project: TypeError: Cannot read property 'regex' of undefined

Sample .md file

---
title:  Dimensions Enterprise Portal
weight: 10
---
#  Dimensions Enterprise Portal

Welcome to the Dimensions Enterprise Portal. This portal is designed for service providers to provision and manage their resellers on the Dimensions Call Analytics platform.

The portal includes the following areas of access:
peterfarrant commented 3 years ago

I think the problem may be in lib/util.js line 2065 case 'basename': But in the settings you have "template": "[dir]/[base]_[locale].[extension]",

My settings I altered slightly to process all .md files not a single named one

    "settings": {
      "markdown": {
        "mappings": {
            "**/*.md": {
            "template": "[dir]/[base]_[locale].[extension]",
            "frontmatter": ["title"]
            }
        }
      },

But I may be barking up the wrong tree as I didn't write it

ehoogerbeets commented 3 years ago

Try using [basename] instead of [base]. I will make it give a better error.

ehoogerbeets commented 3 years ago

Also, I'll fix the the README in the ghfm which says [base] in it instead of [basename]

peterfarrant commented 3 years ago

Well it runs now - but 1) it does not seem to provide the title in the Xlif file for translation. 2) It is changing the files in the output instead of index.md it has index.md_en-GB.md

I cannot get it to produce an xlif for en-GB - have updated https://github.com/iLib-js/loctool/issues/126 for this

peterfarrant commented 3 years ago

Noticed it seems to have done a pseudo on the title in the md file - but not the rest of the file, there it has done a dictionary substitution (later in the page customize has become customise)

---
title: Ðímëñšíõñš Ëñţëŕþŕíšë Põŕţàľ ţõ ţŕàñšľàţë76543210
weight: 10
---
# Dimensions Enterprise Portal

Welcome to the Dimensions Enterprise Portal. This portal is designed for service providers to provision and manage their resellers on the Dimensions Call Analytics platform.
ehoogerbeets commented 3 years ago

Oh I see what is happening. If the title is pseudo-translated, then it is not added to the list of resources to translate. I have not yet implemented the ability to pre-translate with pseudo first and then have the results appear in the xliff file, as you had previously asked for.

I see what is happening with the index.md_en-GB.md file name. The code that does the output filename template calculations originally came from the json plugin. I moved it from there into the main loctool so all the plugins could take advantage of it. I missed a part because the [basename] substitution is looking specifically to strip the extension ".json" from the file name. I'll make new unit tests to guarantee that it can support any extension.

peterfarrant commented 3 years ago

I grabbed those changes and I now get index_en-GB.md index_fr-CA.md

But it did not use to put the language extension on (ie just index.md) - it used to put a seperate folder for the language which is what I was expecting still (and wanted)

I have no way to get a en-GB xliff as my previous fix of returning ispseudo=false in PseudoFactory.js does not work as that has been re-worked. Is this pending?

peterfarrant commented 3 years ago

Using 2.14.1 Found how to get a separate language dir and no locale extension, change the template line in settings

            "template": "[locale]/[dir]/[basename].[extension]",

So this was just my not understanding the settings

But I am still seeing a mismatch in the localising of frontmatter

Noticed it seems to have done a pseudo on the title in the md file - but not the rest of the file, there it has done a dictionary substitution (later in the page customize has become customise)

---
title: Ðímëñšíõñš Ëñţëŕþŕíšë Põŕţàľ ţõ ţŕàñšľàţë76543210
weight: 10
---
# Dimensions Enterprise Portal

Welcome to the Dimensions Enterprise Portal. This portal is designed for service providers to provision and manage their resellers and customise the Dimensions Call Analytics platform.
ehoogerbeets commented 3 years ago

New version of ilib-loctool-ghfm on npm now (v1.8.2) that fixes a few more issues such as the above pseudo localization of the frontmatter, amongst a few other things.

peterfarrant commented 3 years ago

Still not working for me loctool 2.14.1, ghfm 1.8.2 (I ran it in vscode with breakpoints to check I definitely was running these versions) Anything more useful than it's not working I might be able to provide, let me know

  1. I can see an entry in the extracted.xliff for the title but it looks funny with a dubiously nice resname of "title" and I think it is only doing title on the first file, not all files (but not definitely proved that yet)

    <?xml version="1.0" encoding="utf-8"?>
    <xliff version="1.2">
    <file original="content\en-us\_index.md" source-language="en-US" product-name="enterprise">
    <body>
      <trans-unit id="1" resname="title" restype="string" datatype="x-yaml">
        <source>Dimensions Enterprise Portal Customize</source>
      </trans-unit>
      <trans-unit id="24" resname="r710860725" restype="string" datatype="markdown">
        <source>Dimensions Enterprise Portal</source>
      </trans-unit>
     <trans-unit id="49" resname="r567205350" restype="string" datatype="markdown">
        <source>Welcome to the Dimensions Enterprise Portal. This portal is designed for service providers to provision and manage their resellers on the Dimensions Call Analytics platform.</source>
      </trans-unit>
  2. I still cannot get an en-GB xliff

  3. The title element does not seem to go in the country xliff file (I added fr-CA locale as well to prove this as I can get an xliff for that)

  4. If I run with a --pseudo switch I still get the issue of a pseudo title and a substituted rest of the page

peterfarrant commented 3 years ago

@ehoogerbeets any progress on these issues - I see a couple of new versions but seem to be about a different issue

peterfarrant commented 3 years ago

I have done a bit more debugging. I can see that in the extracted strings (test-extracted.xliff) it has got the title but it does not seem to include these in the translating xliff file.

If I use Title, Description or description they appear in the translating xliff.

It appears there is some issue with the keyword title (all lowercase) - which is unfortunately the main frontmatter I need to use. Is title a keyword or something and being excluded ? Which function parses the text to extract the keywords ? I can then debug a bit more

test-extracted.xliff

  <file original="content\en-us\_index.md" source-language="en-US" product-name="test">
    <body>
      <trans-unit id="1" resname="title" restype="string" datatype="x-yaml">
        <source>~.UcClient.~ Mobile Topic</source>
      </trans-unit>
      <trans-unit id="5" resname="r425056357" restype="string" datatype="markdown">
        <source>~.UcClient.~ - Unified Communications Clients</source>
      </trans-unit>
      <trans-unit id="9" resname="Title" restype="string" datatype="x-yaml">
        <source>A smaller topic</source>
      </trans-unit>
      <trans-unit id="10" resname="r1046955961" restype="string" datatype="markdown">
        <source>~.UcClient.~ is the unified communications solution for the ~.Dimensions.~ platform. There are various ways to access the ~.UcClient.~ client feature set depending on your requirements.</source>
      </trans-unit>
      <trans-unit id="13" resname="Description" restype="string" datatype="x-yaml">
        <source>A little description</source>
      </trans-unit>

test-new-locale.xliff

  <file original="content\en-us\_index.md" source-language="en-US" target-language="fr-CA" product-name="test">
    <body>
      <trans-unit id="2" resname="r425056357" restype="string" datatype="markdown">
        <source>~.UcClient.~ - Unified Communications Clients</source>
        <target state="new">~.UcClient.~ - Unified Communications Clients</target>
      </trans-unit>
      <trans-unit id="6" resname="Title" restype="string" datatype="x-yaml">
        <source>A smaller topic</source>
        <target state="new">A smaller topic</target>
      </trans-unit>
      <trans-unit id="7" resname="r1046955961" restype="string" datatype="markdown">
        <source>~.UcClient.~ is the unified communications solution for the ~.Dimensions.~ platform. There are various ways to access the ~.UcClient.~ client feature set depending on your requirements.</source>
        <target state="new">~.UcClient.~ is the unified communications solution for the ~.Dimensions.~ platform. There are various ways to access the ~.UcClient.~ client feature set depending on your requirements.</target>
      </trans-unit>
      <trans-unit id="11" resname="Description" restype="string" datatype="x-yaml">
        <source>A little description</source>
        <target state="new">A little description</target>
      </trans-unit>
peterfarrant commented 3 years ago

@ehoogerbeets found the issue with lowercase title - I had somehow got some different versions of some required modules in a node_modules subdirectory of the ilib-loctool-yaml folder so it was using different versions for the yaml :)

So I have it working great if there is a single page.

When I have multiple pages in the test-extracted.xliff I only have title for the first file it processes in the test-new-en-GB.xliff I only have title in the last file it processes (see attached files changed to .txt to upload)

A trace run of loctool.plugin.YamlFile shows it finding the localised json for all 4 pages (I put 4 pages for my test). I don't think it is me this time ? Is it related to it getting a resname="title" instead of a unique number like all the others eg resname="r1056710649" ?

test-extracted.xliff.txt test-new-en-GB.xliff.txt

loctool v 2.14.1 ilib-loctool-ghfm v1.9.1 ilib-loctool-yaml v 1.2.0

peterfarrant commented 3 years ago

@ehoogerbeets just re tested with the latest loctool v2.15.1, ghfm v1.10.0 and still getting the issue with only the last file's title being available for translation not the title from all of the files. I think it is related to having the resname as title rather than a unique number whereas all the normal non yaml fields have a unique resname like "r217916575"- see extract below

  <trans-unit id="1" resname="title" restype="string" datatype="x-yaml">

Any chance you could review and see if you can spot why

ehoogerbeets commented 2 years ago

Okay, this took a long time, but take a look at this one: https://github.com/iLib-js/ilib-loctool-ghfm/pull/42