Add API docs for fluent-syntax

pdehaan commented 5 years ago

The docs at http://projectfluent.org/fluent.js/fluent-syntax/ say:

The API reference is available at http://projectfluent.org/fluent.js/fluent-syntax.

But I'm not seeing any API reference. Not sure if it is a JSDoc issue or something else, but was looking for API docs to use the library since I'm trying to parse everything manually (see https://github.com/mozilla/blurts-server/issues/601#issuecomment-439141053) and I'm sure I'm doing it incorrectly.

pdehaan commented 5 years ago

Specifically I'm interested in parsing an *.ftl file, and converting it into a JSON blob and verifying that each one of the values/variations doesn't contain invalid HTML using an HTML linter. Currently I'm having to try and inspect each node and see if it's a Placeable or TextElement or VariantList or VariantExpression, etc and add special handling for each type.

I was hoping for some method where I could just parse a Resource or Message and have it return a constructed string (with placeholders or not) which I could pass along to my HTML linter, instead of having to build the strings myself from Placeables and TextElements, etc.

zbraniecki commented 5 years ago

I was hoping for some method where I could just parse a Resource or Message and have it return a constructed string (with placeholders or not) which I could pass along to my HTML linter, instead of having to build the strings myself from Placeables and TextElements, etc.

How would you do that for variants? By definition in case of select expression you have multiple variants and which variant will get selected depends on environmental variables resolved at runtime.

So you either have to walk through the AST and verify each variant, or you need to "resolve" everything to the default variant and only that will get tested.

pdehaan commented 5 years ago

Here's one of our hypothetical *.ftl files:

sign-up = Registrarse
-brand-HIBP =
    {
        [nominative] Сервис Have I Been Pwned
        [genitive] Сервисом Have I Been Pwned
       *[dative] Сервису Have I Been Pwned
    }
...
featured-breach-results =
    { $breachCount ->
        [0] Tu cuenta aparece en la violación <span class="bold"> { $featuredBreach } </span>, pero no aparece en ninguna otra violación de datos conocida.
        [one] Tu cuenta apareció en la violación de <span class="bold"> así como en otra violación.
       *[other] Tu cuenta apareció en la violación de <span class="bold"> { $featuredBreach } </span>, así como en otras { $breachCount } violaciones.
    }

I'm just converting the *.ftl to JSON using parse():

function ftlToJson(ftlPath) {
  const ftl = readFile(ftlPath).toString();
  const body = flSyn.parse(ftl, {withSpans:false}).body;
  return body.map(entry => extract(entry)).filter(entry => entry);
}

And then looping through each entry and trying to convert it to a simple:

{ name: "sign-up", value: "Registrarse" }

Or in the case of these variants, it got a bit more complex and I returned an Array for the .value:

{ name: "featured-breach-results", value: [
  { variant: "0", value: 'Tu cuenta aparece en la violación <span class="bold"> { $featuredBreach } </span>, pero no aparece en ninguna otra violación de datos conocida.' },
  { variant: "one", value: 'Tu cuenta apareció en la violación de <span class="bold"> así como en otra violación.' },
  { variant: "other", value: 'Tu cuenta apareció en la violación de <span class="bold"> { $featuredBreach } </span>, así como en otras { $breachCount } violaciones.' }
]}

Then I'm just looping over my big array and checking if .value is an Array or simple value, and calling the HTML linter on each value to make sure it looks valid. In the case of the featured-breach-results[one] value above, you'll notice that it's missing a closing </span> tag, which is a bit suspicious.

My code is pretty terrible, but seems to be working. I was just wondering if there was an easier way for me to convert from something like this:

  {
    "type": "Term",
    "id": {
      "type": "Identifier",
      "name": "-product-name-nowrap"
    },
    "value": {
      "type": "Pattern",
      "elements": [
        {
          "type": "TextElement",
          "value": "<span class=\"nowrap\">"
        },
        {
          "type": "Placeable",
          "expression": {
            "type": "TermReference",
            "id": {
              "type": "Identifier",
              "name": "-product-name"
            }
          }
        },
        {
          "type": "TextElement",
          "value": "</span>"
        }
      ]
    },
    "attributes": [],
    "comment": null
  },

To a stringified version of the elements[], like this (I don't even care if it tries to substitute that TermReference back in, I'm just trying to lint any potential HTML):

<span class="nowrap"> -product-name </span>

Clicko-expando!

```js const readFile = require("fs").readFileSync; const promisify = require("util").promisify; const flSyn = require("fluent-syntax"); const _glob = require("glob"); const lint = require("htmllint"); const glob = promisify(_glob); run("./locales/*/*.ftl"); /** * Lint a glob of files. * @param {string} g A glob of file(s) to lint. */ async function run(g) { try { for (const locale of await glob(g)) { await lintLocale(locale); } } catch (err) { console.error(err); process.exit(1); } } /** * Load a specific *.ftl file, convert it to JSON, then run each name/value pair through the HTML linter. * @param {string} l A path to a *.ftl file to parse. */ async function lintLocale(l) { const entries = ftlToJson(l); for (const entry of entries) { await lintHtml(l, entry); } } /** * Convert an *.ftl file to JSON. * @param {string} ftlPath * @returns {array} */ function ftlToJson(ftlPath) { const ftl = readFile(ftlPath).toString(); const body = flSyn.parse(ftl, {withSpans:false}).body; return body.map(entry => extract(entry)).filter(entry => entry); } /** * ?? * @param {object} entry */ function extract(entry) { switch (entry.type) { case "GroupComment": // Ignore break; case "Message": case "Term": return _getString(entry); break; default: console.log("Unknown entry type:", entry.type); } } /** * Loop over translations and make sure each name/value pair passes the HTML linter. * @param {string} locale Path to the *.ftl file we're currently linting. * @param {object} data Object w/ name/value pairs to lint. Note that in some cases .value may be an array of variations. */ async function lintHtml(locale, data) { if (Array.isArray(data.value)) { data.value.forEach(v => { // Recursive magic... lintHtml(locale, {name: `${v.name}[${v.variant}]`, value: v.value}); }); } else { const results = await lint(data.value.trim() + "\n", { 'id-class-style': 'dash' }); if (results.length) { console.log(locale); results.forEach(err => { console.error(` - [${err.code}] ${err.rule}: "${data.name} = ${data.value}"`); }); } } } function _getPattern(name, value) { const variants = []; const str = value.elements.reduce((elements, element) => { switch (element.type) { case "Placeable": const res = _getPlaceable(name, element.expression); if (Array.isArray(res)) { res.forEach(v => variants.push(v)); } else { elements.push(res); } break; case "TextElement": elements.push(element.value); break; default: console.error("Unknown value element type:", element.type); } return elements; }, []).join(""); if (variants.length) { return variants; } return str; } function _getPlaceable(name, expression) { switch (expression.type) { case "MessageReference": return ` $${expression.id.name} `; break; case "SelectExpression": return _getVariants(name, expression.variants); break; case "TermReference": case "VariableReference": return expression.id.name; break; case "VariantExpression": return expression.ref.id.name; break; default: console.error("Unknown element expression type:", expression.type); } } function _getString(data) { return { name: data.id.name, value: _getValue(data) }; } function _getValue(data) { switch (data.value.type) { case "Pattern": return _getPattern(data.id && data.id.name, data.value); case "VariantList": return _getVariants(data.id.name, data.value.variants); default: console.error("Unknown value type:", data.value.type); } return value; } function _getVariants(name, variants) { return variants.map(variant => { return { name, variant: variant.key.name || variant.key.value, value: _getValue(variant) }; }); } ```

Pike commented 5 years ago

Peter, I'd love to have such a test in compare-locales. I think there's value in having a single repository for those checks, much rather than having different variants of them run in different projects. Integrating them into compare-locales also has the benefit of them being integrated directly into Pontoon, while localizers save their individual translation.

I've also pondered adding an API like you describe to our python-fluent implementation. The actual code will still need to handle Attributes on Messages, and we should think about how to map nested and chained select expressions:

nested = { $mowers ->
  [one] One mower mows { $lawns ->
      [one] a lawn
    *[many] ${lawns} lawns
   }.
 *[many] { $mowers } mowers mow { $lawns ->
      [one] a lawn
    *[many] ${lawns} lawns
   }.
}

chained = You got { $mails ->
   [one] one mail
  *[many] { $mails } mails
}. Click below to delete { $mails ->
   [one] one mail
  *[many] { $mails } mails
}.

Contrived examples, and something in the lawnmowers one doesn't work right, I think. But you get the idea.

The other challenge we should drill into is how to denote how to run this test. There might be parts that are legal in react-overlays and parts on dom-overlays, and the fact that overlays run at all is binding-dependent. A problem we need to figure out, and we'll probably say semantic comments a whole lot.

pdehaan commented 5 years ago

@Pike Yeah, excellent examples. I only knew about the nominative and genitive stuff because it occurred in the ru translations, but none of the other ones, so my script started failing when I modified my globs.

I think your other examples also show why having some API which just takes a parsed JSON Resource/Message/Term and returns a normalized string (or array of strings) to verify would be super useful.

The only potential issue I could see in baking this into compare-locales is that all the HTML syntax could be different, so we'd need a way to specify which rules/formats our HTML uses. You can see that sort-of in my code above where I have to pass the {'id-class-style': 'dash'} options object to the HTML linter since I guess the default class style in the linter is expecting underscores instead of dashes.

I'm also trying to remember one of the issues we had in FxA where we were only allowing certain HTML tags (like <a>, <strong>, etc), and another case where a locale was adding an additional link to a translation where there wasn't one in the source en-US locale. But not sure how we'd detect those yet, or if we'd need to.

pdehaan commented 5 years ago

Actually, is there a way for me to query the Pontoon GraphQL API and get a list of locales/strings?

I have https://github.com/pdehaan/pontoonql which lets me query a project and filter out locales that are at least 80% translated, but I'm not sure what the syntax is for retrieving the translated strings.

flodolo commented 5 years ago

That's not available with the current implementation https://wiki.mozilla.org/L10n:Pontoon/API

Pike commented 5 years ago

Validation that the DOM is OK is part of the DOM binding code in our overlay implementations. The implementation checks for a few things that are pretty much guaranteed to be OK, and drops everything else, unless explicitly allowed by the localizable element.

IDs for example are excluded from localizable content in general, and are only applied from the source element if there's an data-l10n-name to make the DOM node effectively an argument. Exposing IDs to the localizers is probably fishing for trouble.

I think that the check in Pontoon/compare-locales should check for well-formed-ness, to also catch issues like #188 early.

Re Pontoon API, I'd consider that to be out-of-scope, and probably a thing that's really just asking the pontoon server to spend compute dollars for you. Also, the pontoon server won't know what data you actually have.

stasm commented 5 years ago

Cleaning up old issues a bit. #357 switched JSDoc to ESDoc, which gives an acceptable output: http://projectfluent.org/fluent.js/fluent-syntax/. I'm sure it could still be improved, but it's a good start. I'm going to close this issue now.

projectfluent / fluent.js

Add API docs for fluent-syntax #306