Closed zbraniecki closed 5 years ago
I documented the code and added benchmarking capabilities.
I started looking at this but didn't get to finish today. I'll wrap it up on Monday.
Hey @zbraniecki, thanks for starting working on this.
Would you mind providing a bit more detail on what the goals of this iteration are and how it differs from the current implemenation in fluent-dom
? A short summary of the main features would be nice, too, to make the review easier for me. So far, I've seen that:
data-l10n-pos
is now supported as an alternative to data-l10n-name
,data-l10n-id
which will exclude them from the overlay logic completely,data-l10n-opaque
can exclude the children of an element from the overlay logic, but attributes are still localized,What else is new or different?
I'll need a bit more time to think about the consequences of allowing nested hierarchies. In particular, I'm not sure what should happen if the translation contains an extra level of nesting, e.g. <em><img></em>
versus <img>
in the source. We might be able to make some assumptions based on the fact that text-level elements are 1) all inline, so no block elements can be inserted into them and 2) they relate to text as the name suggests, so maybe the markup from my snippet should be discouraged in favor of <em></em><img>
anyways.
I'm also concerned about the number of data-l10n-*
attributes growing with this iteration. I understand what they are for, but the testing matrix grows exponentially and I'm having trouble wrapping my head around it right now. An element defined in the translations can now have four different attributes which control how it's localized. Plus, the element name and its position and depth in the DOM fragment also play a part. Perhaps this is simply how this must be; if so we'll need a lot of tests. Or maybe there are tradeoffs in functionality which we would be willing to accept, so that the space of possible outcomes is smaller.
Lastly, what's your plan regarding this code? Is this a prototype which you intend to throw out once we know what we're doing, or would you like to use it in fluent-dom
eventually? Github doesn't make it easy to comment on code when it's not a PR, but before I start, I'd like to know what kind of code review you expect, if any.
Thanks again!
Ah, one more thing that I missed in my list above, which is related to the last item:
We should take some time to think about the consequences that removing elements and inserting them back into DOM has on the layout, event handling and custom elements' lifecycle callbacks. For instance, this behavior might cause the connectedCallback
to fire much more frequently than expected for some custom elements. Perhaps that's OK. I'm just taking a note of this here so that we don't forget about it.
I would also like to get a better understanding of the use-cases behind each of the new features. For instance, I'm not sure I agree that we need to support the data-l10n-id
attribute inside of the translated markup. AFAIUI, data-l10n-opaque
can serve as an alternative in the majority of situations. Nested data-l10n-ids
might also encourage developers to unnecessarily split their UI over multiple messages where a single one would be a better choice.
Can data-l10n-opaque
be also used on top-level elements with data-l10n-id
to protect their children from being nuked by a buggy translation?
What else is new or different?
I think you captured it pretty well. I started with the four goals we set in Orlando:
data-l10n-pos
to be used as an alternative to data-l10n-name
data-l10n-opaque
) or having a separate localization (data-l10n-id
)I realize that this code is not complete yet. In particular two things I'd like to work on are:
<em><p/></em>
should match <p/>
)Perhaps this is simply how this must be; if so we'll need a lot of tests.
I'm currently on the side of that answer. My thinking is that we're offering a very sophisticated algorithm that solves a huge class of problems. Six attributes is really not that much to cover that functionality. And yes, part of my goal with this project is to write tons of tests once we agree on the logic.
Lastly, what's your plan regarding this code? Is this a prototype which you intend to throw out once we know what we're doing, or would you like to use it in fluent-dom eventually?
I think I'd like to release it as part of the fluent family of packages much like cached-iterable
and use in fluent-dom
.
Perhaps that's OK. I'm just taking a note of this here so that we don't forget about it.
Yep.
I'm not sure I agree that we need to support the data-l10n-id attribute inside of the translated markup.
I see it as necessary for scenarios such as:
<p data-l10n-id="foo">
When closing the window
<menu>
<menuitem>select previous window</menuitem>
<menuitem>close the application</menuitem>
<menuitem>move focus to the next window</menuitem>
</menu>
.
</p>
we had cases like that already and we know that engineers would like to populate the <menu/>
element from JS and make it opaque
for the foo
translation, and then populate each menuitem with data-l10n-id
on its own.
I can try to assemble more examples, and flod may have a couple more, but I've definitely seen it multiple times already. I understand that we can tell people not to do that, but it seems like we can also make it work, and if we can make it work, and it is useful for UI localization, then I'd side with making our code more complex, so that theirs can be easier, rather than doing the opposite.
Can data-l10n-opaque be also used on top-level elements with data-l10n-id to protect their children from being nuked by a buggy translation?
I don't understand. Can you rephrase?
I added a simple fuzzer
to tests and ran them against the fuzzer. It allowed me to quickly discover some bugs, most of them small, and fix them. Yay! :)
I'm getting slightly more confident that retranslation will work, but that does not remove the necessity to add a lot of tests :)
One non-minor issue found by the fuzzer is that using data-l10n-pos
alters the order of elements which would affect the retranslation result. I temporarily fixed it by setting a pos
attribute on the element which indicates which position it had initially, but I believe we'll likely want to solve it differently. We could store the pos on the DOM using data-l10n-pos
, but that would make the result DOM a bit more messy.
On the other hand, since we only need it in cases where the translation alters the position, maybe it's ok?
Wrt. supporting the data-l10n-id
attribute inside of the translated markup:
I see it as necessary for scenarios such as:
<p data-l10n-id="foo">
When closing the window
<menu>
<menuitem>select previous window</menuitem>
<menuitem>close the application</menuitem>
<menuitem>move focus to the next window</menuitem>
</menu>
.
</p>
With the approach you're suggesting, each menuitem
would get its own l10n-id
and the Fluent file would looks like the following:
foo = When closing the window <menu/>.
foo-previous = select previous window
foo-close = close the application
foo-next = move focus to the next window
This goes counter to the good practice of not splitting translations into multiple strings. The interpolation is implicit and cannot be tracked by the Fluent tooling. fluent-react
used to employ this pattern and it wasn't a good idea; the feature was dropped in 0.6.0.
Here's what I would like to see in the Fluent file instead:
foo =
When closing the window,
<menu>
<menuitem>select previous window</menuitem>
<menuitem>close the application</menuitem>
<menuitem>move focus to the next window</menuitem>
</menu>.
This gives all the important information to the localizer in a single message; it's also searchable and makes the foo
message easier to work with from the tooling perspective.
I can try to assemble more examples, and flod may have a couple more, but I've definitely seen it multiple times already. I understand that we can tell people not to do that, but it seems like we can also make it work, and if we can make it work, and it is useful for UI localization, then I'd side with making our code more complex, so that theirs can be easier, rather than doing the opposite.
The fact that it's possible dooesn't mean that it's a good idea nor that we should do it :) How about we remove this feature from the scope of this iteration and discuss it separately?
Can data-l10n-opaque be also used on top-level elements with data-l10n-id to protect their children from being nuked by a buggy translation?
I don't understand. Can you rephrase?
I think the fact that the code now keeps the elements which are missing from the translation renders my question moot.
What I meant was the following scenario:
<window data-l10n-id="app-window">
<div>The rest of the app.</div>
</window>
In the optimistic scenario, the app-window
translation contains attributes exclusively:
app-window =
.title = App Title
In the error scenario, the translation also defines the value:
app-window = New contents of <window>!
.title = App Title
In the current revision of the overlay mechanic the latter translation would remove <div>The rest of the app.</div>
from the source. IIUC, this iteration solves this by appending it to the end of the translated <window>
element.
I realize that this code is not complete yet. In particular two things I'd like to work on are:
- Skip text level elements when matching (
<em><p/></em>
should match<p/>
)
I would suggest that we leave this for a future iteration. Block elements are invalid inside of text-level elements anyways, so the utility of this feature is very limited.
It might be useful for inline things like <em>Foo <img> Bar</em>
or <cite><a>Author</a></cite>
. If these use-cases prove necessary and common, we can discuss adding this feature in the future.
Here's what I would like to see in the Fluent file instead:
foo = When closing the window, <menu> <menuitem>select previous window</menuitem> <menuitem>close the application</menuitem> <menuitem>move focus to the next window</menuitem> </menu>.
This gives all the important information to the localizer in a single message; it's also searchable and makes the
foo
message easier to work with from the tooling perspective.
Jumping in in the middle here. I think this is a bad idea, because this implies that localizers can break the functionality of the code. Like, a, say, Romulan localizer can remove a privacy option from Firefox settings.
In such a case, I'd prefer functional correctness over the string-glueing.
I'm also stumbling over data-l10n-pos
for the first time consciously, what does that do?
Jumping in in the middle here. I think this is a bad idea, because this implies that localizers can break the functionality of the code. Like, a, say, Romulan localizer can remove a privacy option from Firefox settings.
According to the design of this iteration, that wouldn't be possible. Missing functional children are appended to their parent.
Stas asked me to document the use cases for the features I'm seeking to add in this iteration.
Here's my attempt to document them (and I may edit this comment later to add more):
This seems like the easiest one to rationalize. Break of the identity of an element in result of localization goes against avoiding surprises and is unexpected by all devs I talked to.
It shows up in some cases in form of devs trying to workaround this limitation with additional code that would be much simpler if we retained the identity. Retaining identity also allows for later items on this list, like #171.
Side benefit of the new model is that we're also doing less operation on the source, and in result we don't remove/re-add localizable attributes.
This has been a common woe when working with DOM Fragments. We allow developers to provide some elements (which we whitelisted), but not others. This has been reported as #288.
If a developer can provide an expected DOM structure in the source, we can use it to validate the localization and allow for nested trees to be localized within a single DOM Fragment.
This has been reported as #188. We can avoid that by switching from <template/>
to DOMParser
with text/xml
mode.
One issue with data-l10n-name
is that it requires the developer to name each element in the localization. This makes simple DOM Fragments with a single <img/>
or ` harder to read both in the source and in the translation.
In the new model we have a chance to fix that by allowing matching elements (see (1)) in source and translation to be overlayed without the need to explicitly name each element in the fragment.
The downside of lack of unique names is that localizer will need a different way to reorder elements if they need to.
data-l10n-pos
solves this by allowing a localizer to do:
<p data-l10n-id="foo">
<a href="https://www.mozilla.org"/>
<a href="https://www.firefox.com"/>
</p>
foo =
Use <a>Mozilla</a> <a>Firefox</a>.
or
foo =
Use <a data-l10n-pos="2">Firefox</a> by <a data-l10n-pos="1">Mozilla</a>.
This is a trivial addition because I need to calculate the pos
anyway for (2), so allowing users to optionally override the order is just a fulfillment of the separation of concerns.
One of the challenges we encountered in allowing for nested translations is that we want to make sure that localizers cannot remove elements. In this iteration we solve it by appending not matched elements in order of their appearance at the end of the parent element. This allows them to remain usable, retain identity, and be reused in the retranslation scenario.
We've talked with Stas today about this.
In particular, we'd like to focus on (1), (2) and (6) and consider moving (3), (4) and (5) to separate issues and discuss them one-by-one since those three are backward-compatible additions.
I think we also need to focus in #2.
Closing this in favor of the wiki: https://github.com/zbraniecki/fluent-domoverlays-js/wiki/New-Features-(rev-3) and separate issues to discuss particular items.
@stasm - This is codebase is now ready for a preliminary review.
It has a nice set of tests that it passes, both traditional ones taken from fluent-dom, and several new ones that cover the new functionality.
I'm pretty sure we need more tests and some algorithmic decisions are still in the air, but I do hope that this is a good starting point.
I don't want to work on performance yet, because I think it doesn't make sense to tighten the code before we get the logic ironed out, but I think we can get it to comparable perf to the current implementation.