lukasgeiter / gettext-extractor

A flexible and powerful Gettext message extractor with support for JavaScript, TypeScript, JSX and HTML.
MIT License
98 stars 21 forks source link

Why are escaped character later unescaped? #36

Open chrisnicola opened 5 years ago

chrisnicola commented 5 years ago

https://github.com/lukasgeiter/gettext-extractor/blob/cea9a47216df062ec5acd95a3fe39fdfc4b7dab1/src/html/utils.ts#L20

I am trying to debug an issue where I basically can't match translation keys for HTML with HTML entities like & in it. It brought me to the above line of code which seems problematic.

In my use case I'm translating an element at runtime using element.innerHTML using innerText is not practical because some translations may actually require the HTML to be part of the translation like with a hyperlink.

As a result the innerHTML has the entity as & but the key is forced to be & by the extractor so they can never match.

Is this intended. Could it be made an optional capability instead?

lukasgeiter commented 5 years ago

Yes this is indeed intended.

My goal for the extracted messages is to match the string in the source code as best as possible. If a developer writes a string containing & in the code, I believe it should be extracted this way (at least by default). Translators will probably also prefer & over & which they might not understand.

That said I just noticed that the current implementation doesn't handle strings actually containing & in the source nicely. That is they will also get converted to &. But that's not really what this issue is about...

I will look into adding an option to escape HTML entities in the same way innerHTML does.

If you don't mind not having entities in your translations, you might also want to consider changing your runtime code to match the behavior of the extractor.

chrisnicola commented 5 years ago

@lukasgeiter yeah I'm not sure there is any easy way to do this because of how parse5 works. It is possible parse5 has an option to not change the original text but that would be necessary.

Also changing the runtime code to match the behaviour is not possible. Currently this completely breaks translating strings with & in the for me. Browsers will automatically convert & into & when accessing innerHTML from the rendered HTML to look up the key it will always fail to look it up. In fact the browser spec is the reason parse5 does this as well.

The bottom line is that I can't change the runtime behaviour of web browsers.

chrisnicola commented 5 years ago

I should have noted my workaround for the time being will be that I have to convert & to & at the point I do key lookup at runtime. This will work but it seems less than ideal as from what I can tell this looks like it would be a common problem.

lukasgeiter commented 5 years ago

The workaround you mention is precisely what I meant by changing your runtime code. I understand that this is not an optimal solution for you. I will definitely add an option for this in the future.

chrisnicola commented 5 years ago

Ok thanks, that makes sense.