Open stasm opened 6 years ago
One way to fix this would be to use DOMParser
with the application/xml
type for parsing markup in translations. XML has stricter parsing rules in general but it also allows any elements to be self-closing and it still produces well-formed documents.
We'd need to add the DTD for at least some HTML entities that localizers might expect to work. The full DTD is huge, clocking in at over 11KB (preview). Perhaps we could only cherry pick and support whitespace characters, like
and ­
? In any case, it would be useful to strictly define the list of entities that we support.
I think the practical impact of not supporting this is very limited.
From my experience reviewing sign-offs, I don't think I've ever seen localizers changing the markup used in the reference string. If it's <span></span>
, it remains unchanged.
In some cases I've seen <img id="foo" />
losing the space before the closing slash, but that's as far as changes go.
I think I ran into this, I tried to do this:
<Localized
id="comments-userBoxAuthenticated-signedInAs"
username={<Username />}
>
<Typography variant="bodyCopy" container="div">
{"Signed in as <username />."}
</Typography>
</Localized>
comments-userBoxAuthenticated-signedInAs =
Signed in as <username/>.
But the .
is swallowed at the end while this works:
<Localized
id="comments-userBoxAuthenticated-signedInAs"
username={<Username />}
>
<Typography variant="bodyCopy" container="div">
{"Signed in as <username></username>."}
</Typography>
</Localized>
comments-userBoxAuthenticated-signedInAs =
Signed in as <username></username>.
Yes, this looks familiar. I wrote a test to document the current (broken) behavior: https://github.com/projectfluent/fluent.js/blob/6e1ab3a4982ef6d5f385c77d15a16b27abc82a2e/fluent-react/test/localized_overlay_test.js#L618-L643
Do you think that using <username></username>
is an acceptable work-around?
Do you think that using
is an acceptable work-around?
It's not pretty, and it goes against user expectations. The workaround is simple and functioning though, however I'd like to see this as a temporary solution only.
I'm not fond of it either. Perhaps using DOMParser
and treating translations as XML rather than HTML, would work better? I'd need to check the performance impact, too.
More importantly, <template>
creates an inert DocumentFragment
, which is also good for security.
Other solutions include using one of the available HTML-to-React parsers, like html-react-parser
or react-html-parser
. They add quite much to the size of fluent-react
, however (20KB and 70KB respectively).
I suspect they ship with so much code to support a lot of HTML's edge-cases. There might be a limited subset of safe markup which we could write our own parser for. I'll be away for a few weeks, but I can take a closer look when I'm back.
I'm not fond of it either. Perhaps using
DOMParser
and treating translations as XML rather than HTML, would work better? I'd need to check the performance impact, too.
I think that would be a nice solution.
We're using fluent-react
in Firefox DevTools, and the document can be an xhtml one, and in such case custom tags are ignored:
temp = document.createElement("template")
temp.innerHTML = "<a>test</a>|<custom>testCustom</custom>"
Array.from(temp.content.childNodes) // [ a, #text, #text ]
whereas with DOMParser
it works as expected:
parser = new DOMParser()
doc = parser.parseFromString("<a>test</a>|<custom>testCustom</custom>", "text/html")
Array.from(doc.body.childNodes) // [ a, #text, custom ]
And sometimes we don't even have a document
reference, which makes some of panel to fail loading when trying to update our version of fluent-react to 0.16.1
Changing the parser logic can introduce subtle regressions, the HTML fixup is probably different between document and element. And it needs a security review, as Stas mentioned.
I still like the lack of XML complexity in our DOM overlay stories.
Both
fluent-dom
andfluent-react
currently use the<template>
element to safely parse markup in translations. The<template>
element internally uses the HTML parser. This is similar in effect to usingDOMParser.parseFromString
withtext/html
as supported type.The HTML parser has rather strict rules on parsing void elements, unlike the XML parser. It only allows the elements known to be void (e.g.
input
) to be self-closing (<element/>
). Non-void elements are parsed as if they weren't properly closed.This means that the following markup:
…is parsed as:
The consequence for overlays is that localizers can't use the self-closing form with non-void elements. Admittedly, they're not valid HTML markup, but the difference is rather subtle: