Initial review by stas - Githubissues

zbraniecki commented 5 years ago

@stasm - This is codebase is now ready for a preliminary review.

It has a nice set of tests that it passes, both traditional ones taken from fluent-dom, and several new ones that cover the new functionality.

I'm pretty sure we need more tests and some algorithmic decisions are still in the air, but I do hope that this is a good starting point.

I don't want to work on performance yet, because I think it doesn't make sense to tighten the code before we get the logic ironed out, but I think we can get it to comparable perf to the current implementation.

zbraniecki commented 5 years ago

I documented the code and added benchmarking capabilities.

stasm commented 5 years ago

I started looking at this but didn't get to finish today. I'll wrap it up on Monday.

stasm commented 5 years ago

Hey @zbraniecki, thanks for starting working on this.

Would you mind providing a bit more detail on what the goals of this iteration are and how it differs from the current implemenation in fluent-dom? A short summary of the main features would be nice, too, to make the review easier for me. So far, I've seen that:

nested hierarchies are supported,
data-l10n-pos is now supported as an alternative to data-l10n-name,
nested elements can now have data-l10n-id which will exclude them from the overlay logic completely,
data-l10n-opaque can exclude the children of an element from the overlay logic, but attributes are still localized,
the code now keeps in the DOM the elements which are missing from the translation.

What else is new or different?

I'll need a bit more time to think about the consequences of allowing nested hierarchies. In particular, I'm not sure what should happen if the translation contains an extra level of nesting, e.g. <em><img></em> versus <img> in the source. We might be able to make some assumptions based on the fact that text-level elements are 1) all inline, so no block elements can be inserted into them and 2) they relate to text as the name suggests, so maybe the markup from my snippet should be discouraged in favor of <em></em><img> anyways.

I'm also concerned about the number of data-l10n-* attributes growing with this iteration. I understand what they are for, but the testing matrix grows exponentially and I'm having trouble wrapping my head around it right now. An element defined in the translations can now have four different attributes which control how it's localized. Plus, the element name and its position and depth in the DOM fragment also play a part. Perhaps this is simply how this must be; if so we'll need a lot of tests. Or maybe there are tradeoffs in functionality which we would be willing to accept, so that the space of possible outcomes is smaller.

Lastly, what's your plan regarding this code? Is this a prototype which you intend to throw out once we know what we're doing, or would you like to use it in fluent-dom eventually? Github doesn't make it easy to comment on code when it's not a PR, but before I start, I'd like to know what kind of code review you expect, if any.

Thanks again!

stasm commented 5 years ago

Ah, one more thing that I missed in my list above, which is related to the last item:

The identity of source children is now preserved.

We should take some time to think about the consequences that removing elements and inserting them back into DOM has on the layout, event handling and custom elements' lifecycle callbacks. For instance, this behavior might cause the connectedCallback to fire much more frequently than expected for some custom elements. Perhaps that's OK. I'm just taking a note of this here so that we don't forget about it.

stasm commented 5 years ago

I would also like to get a better understanding of the use-cases behind each of the new features. For instance, I'm not sure I agree that we need to support the data-l10n-id attribute inside of the translated markup. AFAIUI, data-l10n-opaque can serve as an alternative in the majority of situations. Nested data-l10n-ids might also encourage developers to unnecessarily split their UI over multiple messages where a single one would be a better choice.

Can data-l10n-opaque be also used on top-level elements with data-l10n-id to protect their children from being nuked by a buggy translation?

zbraniecki commented 5 years ago

What else is new or different?

I think you captured it pretty well. I started with the four goals we set in Orlando:

allow for source to provide (nested) structure that can be replicated in localization
allow for data-l10n-pos to be used as an alternative to data-l10n-name
allow for elements to be excluded from localization even if they're inside of the DOM Fragment by either a) not having any localization (data-l10n-opaque) or having a separate localization (data-l10n-id)
retain the elements from the source where possible to allow for any references/eventhandlers to continue working

I realize that this code is not complete yet. In particular two things I'd like to work on are:

Skip text level elements when matching (<em><p/></em> should match <p/>)
Verify that retranslation works (I'm afraid that the current algorithm may not in edge cases)

Perhaps this is simply how this must be; if so we'll need a lot of tests.

I'm currently on the side of that answer. My thinking is that we're offering a very sophisticated algorithm that solves a huge class of problems. Six attributes is really not that much to cover that functionality. And yes, part of my goal with this project is to write tons of tests once we agree on the logic.

Lastly, what's your plan regarding this code? Is this a prototype which you intend to throw out once we know what we're doing, or would you like to use it in fluent-dom eventually?

I think I'd like to release it as part of the fluent family of packages much like cached-iterable and use in fluent-dom.

Perhaps that's OK. I'm just taking a note of this here so that we don't forget about it.

Yep.

I'm not sure I agree that we need to support the data-l10n-id attribute inside of the translated markup.

I see it as necessary for scenarios such as:

<p data-l10n-id="foo">
  When closing the window
  <menu>
    <menuitem>select previous window</menuitem>
    <menuitem>close the application</menuitem>
    <menuitem>move focus to the next window</menuitem>
  </menu>
  .
</p>

we had cases like that already and we know that engineers would like to populate the <menu/> element from JS and make it opaque for the foo translation, and then populate each menuitem with data-l10n-id on its own.

I can try to assemble more examples, and flod may have a couple more, but I've definitely seen it multiple times already. I understand that we can tell people not to do that, but it seems like we can also make it work, and if we can make it work, and it is useful for UI localization, then I'd side with making our code more complex, so that theirs can be easier, rather than doing the opposite.

Can data-l10n-opaque be also used on top-level elements with data-l10n-id to protect their children from being nuked by a buggy translation?

I don't understand. Can you rephrase?

zbraniecki commented 5 years ago

I added a simple fuzzer to tests and ran them against the fuzzer. It allowed me to quickly discover some bugs, most of them small, and fix them. Yay! :)

I'm getting slightly more confident that retranslation will work, but that does not remove the necessity to add a lot of tests :)

One non-minor issue found by the fuzzer is that using data-l10n-pos alters the order of elements which would affect the retranslation result. I temporarily fixed it by setting a pos attribute on the element which indicates which position it had initially, but I believe we'll likely want to solve it differently. We could store the pos on the DOM using data-l10n-pos, but that would make the result DOM a bit more messy. On the other hand, since we only need it in cases where the translation alters the position, maybe it's ok?

stasm commented 5 years ago

Wrt. supporting the data-l10n-id attribute inside of the translated markup:

I see it as necessary for scenarios such as:

<p data-l10n-id="foo">
  When closing the window
  <menu>
    <menuitem>select previous window</menuitem>
    <menuitem>close the application</menuitem>
    <menuitem>move focus to the next window</menuitem>
  </menu>
  .
</p>

With the approach you're suggesting, each menuitem would get its own l10n-id and the Fluent file would looks like the following:

foo = When closing the window <menu/>.
foo-previous = select previous window
foo-close = close the application
foo-next = move focus to the next window

This goes counter to the good practice of not splitting translations into multiple strings. The interpolation is implicit and cannot be tracked by the Fluent tooling. fluent-react used to employ this pattern and it wasn't a good idea; the feature was dropped in 0.6.0.

Here's what I would like to see in the Fluent file instead:

foo =
    When closing the window,
    <menu>
        <menuitem>select previous window</menuitem>
        <menuitem>close the application</menuitem>
        <menuitem>move focus to the next window</menuitem>
    </menu>.

This gives all the important information to the localizer in a single message; it's also searchable and makes the foo message easier to work with from the tooling perspective.

I can try to assemble more examples, and flod may have a couple more, but I've definitely seen it multiple times already. I understand that we can tell people not to do that, but it seems like we can also make it work, and if we can make it work, and it is useful for UI localization, then I'd side with making our code more complex, so that theirs can be easier, rather than doing the opposite.

The fact that it's possible dooesn't mean that it's a good idea nor that we should do it :) How about we remove this feature from the scope of this iteration and discuss it separately?

stasm commented 5 years ago

Can data-l10n-opaque be also used on top-level elements with data-l10n-id to protect their children from being nuked by a buggy translation?

I don't understand. Can you rephrase?

I think the fact that the code now keeps the elements which are missing from the translation renders my question moot.

What I meant was the following scenario:

<window data-l10n-id="app-window">
    <div>The rest of the app.</div>
</window>

In the optimistic scenario, the app-window translation contains attributes exclusively:

app-window =
    .title = App Title

In the error scenario, the translation also defines the value:

app-window = New contents of <window>!
    .title = App Title

In the current revision of the overlay mechanic the latter translation would remove <div>The rest of the app.</div> from the source. IIUC, this iteration solves this by appending it to the end of the translated <window> element.

stasm commented 5 years ago

I realize that this code is not complete yet. In particular two things I'd like to work on are:

Skip text level elements when matching (<em><p/></em> should match <p/>)

I would suggest that we leave this for a future iteration. Block elements are invalid inside of text-level elements anyways, so the utility of this feature is very limited.

It might be useful for inline things like <em>Foo <img> Bar</em> or <cite><a>Author</a></cite>. If these use-cases prove necessary and common, we can discuss adding this feature in the future.

Pike commented 5 years ago

Here's what I would like to see in the Fluent file instead:
foo =
    When closing the window,
    <menu>
        <menuitem>select previous window</menuitem>
        <menuitem>close the application</menuitem>
        <menuitem>move focus to the next window</menuitem>
    </menu>.
This gives all the important information to the localizer in a single message; it's also searchable and makes the foo message easier to work with from the tooling perspective.

Jumping in in the middle here. I think this is a bad idea, because this implies that localizers can break the functionality of the code. Like, a, say, Romulan localizer can remove a privacy option from Firefox settings.

In such a case, I'd prefer functional correctness over the string-glueing.

Pike commented 5 years ago

I'm also stumbling over data-l10n-pos for the first time consciously, what does that do?

stasm commented 5 years ago

Jumping in in the middle here. I think this is a bad idea, because this implies that localizers can break the functionality of the code. Like, a, say, Romulan localizer can remove a privacy option from Firefox settings.

According to the design of this iteration, that wouldn't be possible. Missing functional children are appended to their parent.

zbraniecki commented 5 years ago

Stas asked me to document the use cases for the features I'm seeking to add in this iteration.

Here's my attempt to document them (and I may edit this comment later to add more):

1) Retain the elements from the source to allow for any references/eventhandlers to continue working.

This seems like the easiest one to rationalize. Break of the identity of an element in result of localization goes against avoiding surprises and is unexpected by all devs I talked to.

It shows up in some cases in form of devs trying to workaround this limitation with additional code that would be much simpler if we retained the identity. Retaining identity also allows for later items on this list, like #171.

Side benefit of the new model is that we're also doing less operation on the source, and in result we don't remove/re-add localizable attributes.

2) Allow for source to provide (nested) structure that can be replicated in localization

This has been a common woe when working with DOM Fragments. We allow developers to provide some elements (which we whitelisted), but not others. This has been reported as #288.

If a developer can provide an expected DOM structure in the source, we can use it to validate the localization and allow for nested trees to be localized within a single DOM Fragment.

3) Allow self-contained elements in localization.

This has been reported as #188. We can avoid that by switching from <template/> to DOMParser with text/xml mode.

4) allow for data-l10n-pos to be used as an alternative to data-l10n-name

One issue with data-l10n-name is that it requires the developer to name each element in the localization. This makes simple DOM Fragments with a single <img/> or ` harder to read both in the source and in the translation.

In the new model we have a chance to fix that by allowing matching elements (see (1)) in source and translation to be overlayed without the need to explicitly name each element in the fragment. The downside of lack of unique names is that localizer will need a different way to reorder elements if they need to. data-l10n-pos solves this by allowing a localizer to do:

<p data-l10n-id="foo">
  <a href="https://www.mozilla.org"/>
  <a href="https://www.firefox.com"/>
</p>

foo =
    Use <a>Mozilla</a> <a>Firefox</a>.

or

foo =
    Use <a data-l10n-pos="2">Firefox</a> by <a data-l10n-pos="1">Mozilla</a>.

This is a trivial addition because I need to calculate the pos anyway for (2), so allowing users to optionally override the order is just a fulfillment of the separation of concerns.

5) allow for elements to be excluded from localization even if they're inside of the DOM Fragment by either a) not having any localization (data-l10n-opaque) or having a separate localization (data-l10n-id)

a) #169 b) #171

6) always return append all elements that were not matched in the output

One of the challenges we encountered in allowing for nested translations is that we want to make sure that localizers cannot remove elements. In this iteration we solve it by appending not matched elements in order of their appearance at the end of the parent element. This allows them to remain usable, retain identity, and be reused in the retranslation scenario.

zbraniecki commented 5 years ago

We've talked with Stas today about this.

In particular, we'd like to focus on (1), (2) and (6) and consider moving (3), (4) and (5) to separate issues and discuss them one-by-one since those three are backward-compatible additions.

Pike commented 5 years ago

I think we also need to focus in #2.

zbraniecki commented 5 years ago

Closing this in favor of the wiki: https://github.com/zbraniecki/fluent-domoverlays-js/wiki/New-Features-(rev-3) and separate issues to discuss particular items.

zbraniecki / fluent-domoverlays-js

Initial review by stas #1

1) Retain the elements from the source to allow for any references/eventhandlers to continue working.

2) Allow for source to provide (nested) structure that can be replicated in localization

3) Allow self-contained elements in localization.

4) allow for data-l10n-pos to be used as an alternative to data-l10n-name

5) allow for elements to be excluded from localization even if they're inside of the DOM Fragment by either a) not having any localization (data-l10n-opaque) or having a separate localization (data-l10n-id)

6) always return append all elements that were not matched in the output