khaledhosny commented 9 years ago

It seems there is a problem with escaping backslash in translations, when I enter \ submit, then re-open the unit it gets converted into \\. I’m not sure if this is the intended behaviour, but if yes then it is confusing and I don’t think it is a good idea to expose escaping to the translators, it should be handled transparently.

khaledhosny commented 9 years ago

See http://mozilla.locamotion.org/ar/firefox/translate/browser/chrome/browser/browser.properties.po#unit=13367660, for an example.

iafan commented 8 years ago

@julen, let's consider this bug a priority one. See this note for additional context.

julen commented 8 years ago

@iafan mind adding the additional info right here so the reference doesn't get lost in the future?

iafan commented 8 years ago

Ok, here you go:

This is how the string looks like in the resources (single backslashes):

This is how it looks like in the .po file (all backslashes are properly escaped as \\):

And this is how it looks like in the Pootle UI (see that backslashes are displayed as \\, while they really should be displayed as \).

I changed the translation to: \' and submitted it, but when I open the unit, I see this (in the translation, the text is displayed as \\'):

julen commented 8 years ago

After investigation, I'd say we are in front of two different bugs, one of data mangling (this report), and another one of incorrect display of escapes.

For the issue concerning this report, submitting a single backlash gets doubled. I've been able to track this down and identify it gets doubled in Django's request cycle, where the QueryDict object for request.POST is built. It calls urllib.parse_qsl, which in turn uses urllib.unquote. Unquoting doubles the backslash, and this is then saved to the unit.

To be fair I'm not entirely sure if the front-end/back-end sides should be doing something else here; I was under the impression such cases where handled by the framework. Any thoughts on the matter?

For the issue about displaying escapes, I have filed #4165.

julen commented 8 years ago

Still didn't figure out the root cause, however here's more specific data on the current state of things:

user input	textarea field (Python)	DB
`\`	`\\`	`\`
`\\`	`\\`	`\`
`\\\`	`\\\\`	`\\`
`\\\\`	`\\\\`	`\\`

dwaynebailey commented 8 years ago

I think with \ we've generally had a problem, which might or might not be related to the above issue. That is when a user enters a single \ we are not sure if that is their intention. I.e. should we take \ and escape it as \\. Or is the \ actually the start of an escape sequence \n, \t, etc. And in the case where a user only wants a single \ and they follow that with a letter that is part of a valid escape sequence, but which they don't want, what do they do?

My gut feeling would be that the DB should represent what we believe things should be and that serializing should sort things out. But I'm not sure if we get tripped up with the difference between \ and \\.

Looking at Virtaal as a user case the main idea was to hide escaping from users. To replicate that in Pootle we shouldn't have people worrying about \ vs \\ they should just be able to type the character that they want i.e. \.

The problem we hit when we don't escape is the potential for a user (however remote) to enter a valid escape unintentionally. The way Virtaal dealt with this was to use the ⏎ and → for return and tab so we were explicit. But I'm not sure if we can do that in a textarea without major coding changes.

I'm not sure if I'm helping to resolve this here though. Maybe the first step is to make the UI and DB consistent.

julen commented 8 years ago

You can entirely disregard my previous comments, as my understanding when debugging strings was incorrect (PEBKAC): I was interpreting the output of repr() as the actual string value.

>>> from urllib import unquote
>>> value = unquote('foo \ bar')
>>> value
'foo \\ bar'
>>> print repr(value)
'foo \\ bar'
>>> print value
foo \ bar

iafan commented 8 years ago

[...] in Pootle we shouldn't have people worrying about \ vs \ they should just be able to type the character that they want i.e. .

Exactly. We need to always treat user input as is, without trying to escape/unescape it in DB. What users sees in the translation UI is what needs to be in the DB. Of course, we need to do proper escaping/unescaping when dealing with underlying files (e.g. .po), but that part needs to be completely transparent.

The problem we hit when we don't escape is the potential for a user (however remote) to enter a valid escape unintentionally.

This is not Pootle's problem. Ideally, when dealing with a specific file format, the converter needs to deal with escaping/unescaping. But if for some reason it is desired to translate the raw (not unescaped) string, then the translation also needs to be treated as a raw one, and translators will be following the escaping rules exposed in the source string.

julen commented 8 years ago

The core of the issue described here comes from the fact that \ is doubled when the text value is provided for the input textarea. This also happens for other special characters such as \n, \r, and \t.

We were discussing this briefly with @iafan and were mentioning something along the lines that he already commented:

When outputting text (in the textarea or elsewhere in the UI), we should display the DB value as it is, without applying any fancy transformation on top of it.
The actual escaping should only happen when data is imported or serialized back to specific formats.

Note currently UI highlighting filters also have their piece of cake here, as they not only highlight special markers, but also transform the output text (e.g. \n is converted to \\n ), effectively creating an artificial gap between how translators visually see the string and how the actual text is.

We believe any transformations like these are unnecessary, and need to go away, so the UI only displays the actual DB value. (I must say I tried removing any filters before but the results were not satisfactory. I'll need to double-/triple-check that again.)

It'd be good if we can confirm we are all on the same page here. Ultimately, the goal is to make it simple for translators to input the actual text, reducing the chance for surprises or accidentally incurring on errors, and as this issue displays, we are not there yet.

dwaynebailey commented 8 years ago

@julen I think we're on the same page. Though I'd like to highlight some concerns.

I think the summary is this: Format <- any format specific escaping needs -> DB <-> UI

With regards fancy transforms, I don't agree that doing the \n  adds no value, it give the translator a little more clues about how a string will flow. It worked in Virtaal to reduce confusion and add more value. But I think it adds a layer of confusion in trying to clean this up. So stripping these out and just showing the string as it is :+1:.

My concern is that the DB will have its own escaping concerns for string i.e. we never actualy display the DB value, we unescape the real DB value. Most are not an issue and can be roundtripped quite easily. My concerns centre around \ and it potential confusion as an escape like \n and how we handle those. But that concern doesn't impact my agreement with the general idea.

julen commented 8 years ago

The summary reasons out well. Note we currently do some escaping for the DB value, however this causes issues like the one we are trying to address here.

I've been playing around code and have something that provides the expected input/output in the textarea and DB — it presents some other caveats and questions though. I'll tidy it up a bit and open up a RFC PR soon with that to see where we can go from there. Related to this, and talking about the fancy transform, I think it's fine to visually display a new line ( ) when there's an actual new line in the DB as well, but having a \\n as a marker as we currently have is just confusing. We could probably use any other type of marker as those showcased in #3350.

unho commented 8 years ago

Perhaps related #3869.

julen commented 8 years ago

I just put up some code for testing/comments in the PR linked above — it's not intended to be merged, so just treat it as a showcase of the taken direction.

There's still plenty of work to be done:

Display of special characters needs to be reworked. In the showcased code new lines are displayed as such in source texts, however there's nothing similar for tabs (just added a simple <pre>TAB</pre> placeholder for now). It's maybe worth to play around having visual indicators here (a potential option is something similar to #3350).
Highlighting logic needs to be unified so the end result is consistent (#4165).
Copying from source to target needs to be fixed.

iafan commented 8 years ago

I promised @julen to comment on displaying of non-printable characters: since we want to display the raw data from the database, this data may contain some unprintable characters (hard line breaks are the most common ones, but there can also be tabulation symbols, non-breakable spaces and something else). I think we need to display them in the source string, but in the way that doesn't prevent people from copy-pasting text from source to target.

3350 is definitely something that comes to my mind. Here's the demo that implements the same rendering approach to non-printable symbols (since we're rendering this in the context of HTML, we don't need a special font here):

http://jsfiddle.net/iafan/w4tpu6q6/2/

dwaynebailey commented 8 years ago

@julen I managed to grab some time to test this. I made a PO files with some potential escapes. I have the following observations:

Newlines in source test. \n is a problem if it appears at the end of a line since the user is unable to determine if this is a normal line ending or a newline.
- The same applies to the newline in the target. The user has no visual cue that they've added a newline at the end, or that one is present.
Similar thing with \t - the TAB marker shows how a similar marker can be helpful. But again if its at the end of a sentence you can't see it.
- In a string you are unsure if its a space or a tab.
\r\n - which probably shouldn't exist but does - no way to specify that as you simply see a newline. Its only one char in the UI anyway.
\ at the end of the line and in a sentence seems to work just fine.

I didn't test any input of strings.

@iafan thanks for the jsfiddle example. It seems the easy part is displaying the source string. I'm not 100% sure I like it being textual, I'd rather have icons or unicode chars to show these markers. The issues I see are these two:

You need to be able to copy the special characters from source to target
You can't see the special characters in the target. With the user manually entering \t and \n at least they could actually see things. So without visual clues you don't know what character you put there or even if it is a special character e.g. nbsp.

Trust that gives some enlightenment.

iafan commented 8 years ago

I'm not 100% sure I like it being textual, I'd rather have icons or unicode chars to show these markers.

Unfortunately, there are not so many special unprintable characters that you can represent using special Unicode characters in a way that the user will understand (Enter, non-breakable space, tab). I want to have a mechanism that would work for any kinds of unprintable characters first; if at some point we decide to have a graphical representation for certain common characters, then we can always add that on top of the existing mechanism.

You need to be able to copy the special characters from source to target You can't see the special characters in the target.

Yes, absolutely. These characters are copied now on manual copy-paste, but this case needs to be combined with approach from #3350 to display such characters in the editor. Also, nothing prevents us from having all unprintable characters to work as 'placeables' (clickable targets that allow to quickly insert their values into the editor).

dwaynebailey commented 8 years ago

@iafan Re getting basic right in text and adding icons later as/if needed :+1:

I couldn't replicate manual copy and paste, it didn't work for me in the jsfiddle example. But if that's known then its just an issue in the demo. But it seems this is on you radar so I'm good with that.

iafan commented 8 years ago

Looks like copy-pasting special symbols doesn't work in Firefox; it does in Chrome.

FWIW, this is how the visual representation of certain characters could look like: http://jsfiddle.net/iafan/w4tpu6q6/

dwaynebailey commented 8 years ago

@julen some quick observations from your last push. Some might be things not implemented or that you are already aware of. I might be repeating myself as I can't remember all I wrote in the last comment and would rather give you a fresh view.

Context lines. These look good. The use of text instead of escapes works, even if they might be placeholders for something else later.
- \r\n -> LF - not sure if that was intentional but it works and makes things easier for the localiser
Source text.
- Not showing LF is problematic I think as it's different from the context lines. I assume this is intentional as TAB is shown. Also you don't know there is a LF if its at the end of the line or in fact why the lines are split if you don't have the LF marker.
Target text
- You can't see the markers
- Copy source to target doesn't work.
TM
- Looks good and matches the rest
- Hard to see the strike out through the markers
- There is still special rendering of leading space with red squiggles.
- Where source had LF between two words and TM match has TAB between same two words. LF is striked out, as expected. But we still retain the \n in the rendering (I assume because source text is the base from which we show the difference). I'm really not sure which correct. Just highlighting as something to check and think about.
Spaces
- Leading and trailing space is still using red squiggles. It probably wants to be part of these changes
- Two trailing spaces leads to <>
- NBSP isn't highlighted.
- Copy src to tgt and NBSP becomes SP
Other
- Undo after a copy leaves unit fuzzy.

julen commented 8 years ago

Thank you for the input @dwaynebailey — note the PR was still rough around the edges, I only made a push of my local changes the other day so I could set it up on our staging server. Now I have pushed more changes, unifying all highlighting logic in a single place (fixes #4165 basically).

Because of this latest change, the diff highlighting has currently the same behavior in TM matches and user suggestions, therefore displaying the part that was removed and the part that was added. Unless I'm checking it wrong, user suggestions previously displayed only the part that was added. Based on the feedback, I'm happy to make that behavior adjust selectively.

LF is definitely shown in source texts. In fact, and as already mentioned, all the highlighting logic is now shared. It might be that you were looking at a cached rendering of a editing unit, therefore you were getting a stale rendering for the unit.

Regarding the target text, I'm afraid we won't be able to display any markers or anything fancy as long as we use regular textarea elements. This is a whole topic on its own, and I'd leave it aside from this issue.

I have replaced the red squiggles in favor of subtle open boxes, and multiple leading/intermediate/trailing spaces are displayed in such fashion. NBSP is highlighted as well. _(NBSP becoming SP is an optical illusion in Firefox (check bug 359303; I've been bitten by this too and spent a lot of time debugging), however NBSP ends up properly in the textarea.)_

Pending:

Figuring out how to do copying properly. We have the actual source texts in the client, so making the copy button work is not a big deal. I'm a little bit lost when it comes to regular text copying with the mouse though, as browsers exhibit different behavior when there's content generated via pseudo-elements. Considering this limitation, is the copy button good enough?
Inputting an actual tab using the keyboard's physical tab key. Currently it'll switch focus, following browsers' default behavior. Given that it's not very common to find actual \ts in source texts (at least in my experience), I'm not sure how important it is to address this as part of this issue though.

julen commented 8 years ago

In order to avoid this from getting stalled, I'd love to hear your thoughts on the current state of things @iafan @dwaynebailey.

iafan commented 8 years ago

Figuring out how to do copying properly.

As I was already mentioning above, having #3350 (special font) in place would allow us to implement copy-pasting properly. Here's how it would work:

We have a font which has certain symbols defined in the Private Use Area
We get raw string from the DB and map known symbols (line feed, tab, nbsp, etc.) to the corresponding Unicode symbols our font supports. Then we display this string in both source text and in the editor, both having the support for this font. This allows us to see and copy-paste these special symbols as any other regular symbols.
Before saving the translation to the database, we do the reverse conversion of the private Unicode symbols to the real ones.

Inputting an actual tab using the keyboard's physical tab key.

I don't think this is necessary, as this will break navigation, and navigation is more important. Tab is rarely, if ever, used in translatable strings, and if it's there, one would be able to select, copy and paste it as a regular visible symbol if we have this custom font in place.

dwaynebailey commented 8 years ago

Quick test

Rendering in source seems fines for all my test file also browsed units.
Copy source to target does not work, all escapes are ignored
Copy TM results works correctly and copies escapes as expected
Target doesn't render escapes (we all know that just want to make sure we don't forget ;)
Clicking on the 'icon' in source copies the escapes to target correctly.
Entering tab doesn't work
Space highlighting is good, pre, post and double spaces are shown in source and in browse

dwaynebailey commented 8 years ago

@iafan

Special font approach. I think I like that and it should work for us. My only concern might be RTL and its impact there, as I'm not sure how rendering engines see the private use space area in terms of meta information like punctuation, directionality, etc. If we can match closely to the original character we're faking then it should work.

Typing tab....

I think this ignore the general issue that you can't use the keyboard to enter characters. LF likely works for us as we can press enter. But TAB and perhaps others ar enot possible. Reverting to hey just click ignores the advantage of speed that using a keyboard gives a translator.

I'm not sure how to solve this issue. I'm happy to take the direction of the user must click on the symbol for now but I'd like us to agree that this isn't ideal for speed.

If the editor had autocomplete/suggest we may be able to get around this by allowing \ to bring up a list of possible special escapes. But I don't want to force that issue into this one.

khaledhosny commented 8 years ago

Using PUA for BiDi neutral characters like spaces is likely to break BiDi badly as all PUA regions AFAIK are classified as strong LTR characters. But may be the bidi neutral effect can be faked using BiDi isolates, though I’m not sure how widely supported they are in browsers since it is a rather recent addition to the BiDi algorithm.

iafan commented 8 years ago

I'll throw together some live demo of the font approach for everybody to try. RTL is definitely the biggest potential challenge with this approach.

iafan commented 8 years ago

Also: there are some symbols in PUA that are considered BiDi-neutral, see e.g. http://www.kreativekorp.com/charset/unicode.php?char=F8FF

So we might just need to carefully select mappings for the font symbols in PUA range.

unho commented 8 years ago

Can't we consider these special characters to be placeables?

iafan commented 8 years ago

Can't we consider these special characters to be placeables?

Sure, why not. This is an independent thing, though.

iafan commented 8 years ago

I uploaded test files here: https://www.dropbox.com/s/hcrjqsa5bge7bnu/raw_font_test.zip?dl=1

The archive contains the font, the test html page and README for Firefox users (they may require tweaking config settings to make TTF load locally; this won't be an issue for the web app).

Try playing around with editing text in these textareas, copy-pasting it, etc. There are some tests for RTL rendering, but from what I can tell rendering looks kind of good by default, and trying to make all special characters treated as RTL doesn't improve things much.

Would love to hear your feedback.

dwaynebailey commented 8 years ago

@iafan had a look at this, seems to work fine. I can't say much for the RTL stuff though, I think we need @khaledhosny's input for that.

Testing

Copying a char into the target works.
Entering the char doesn't work. So pressing Enter and copying a real tab don't work to produce the needed char. That I think could lead to some confusion if we are not doing this on the fly in the target.

Some observations

If you copy and paste from source to target, that works. But the risk here is someone copying from the UI into some other app, that would just not render. I'm sure there are way around, like a copy button for such, so not that worried. Just that we should be aware of that use case.
I'm trying to remember why we needed to do this in the PUA :)
With things like ZWJ and such we probably want to be able to show and hide these much like in a wordprocessor, so that it is possible to read the text without confusion.
Using colour in the PUA chars could help (but I realise that makes rendering harder). Just would help to deemphasise special chars.
Doing autosuggest in the target would allow us to allow keyboard input of the correct character so typing \ brings up a list of the special chars. \n selects [CR] icon, etc. That on top of clicking on the char in the source could fully address input.

iafan commented 8 years ago

@dwaynebailey I should have mentioned that this is just a static demo, with no JavaScript-based processing. It shows a) how the font is rendered (including RTL), and b) how you can copy-paste those symbols as a regular text.

On top of that, there needs to be some JS logic that will map invisible symbols to visible ones, and this logic should kick in on any change of the value. This way, when one presses Enter, or pastes some text into the textarea, it would properly adjust the display of such special symbols.

Using colour in the PUA chars could help (but I realise that makes rendering harder). Just would help to deemphasise special chars.

We can do special char highlighting using syntax highlighters from CodeMirror. CodeMirror syntax highlighting can be used both in a textarea mode and to highlight static text (we can use this to highlight source).

iafan commented 8 years ago

With things like ZWJ and such we probably want to be able to show and hide these much like in a wordprocessor, so that it is possible to read the text without confusion.

This is what the "Raw" mode is about. The idea is to display the majority of these symbols only in Raw mode. It will be only tab, cr and lf symbols that will be always visible.

iafan commented 8 years ago

Because of this latest change, the diff highlighting has currently the same behavior in TM matches and user suggestions, therefore displaying the part that was removed and the part that was added. Unless I'm checking it wrong, user suggestions previously displayed only the part that was added. Based on the feedback, I'm happy to make that behavior adjust selectively.

I missed that one from initial @julen's comment. For user suggestions we also display the full diff (removed/added parts). So the differ rendering is consistent with similar translations.

iafan commented 8 years ago

@khaledhosny any feedback from you?

julen commented 8 years ago

We are resuming work on this, and if there's any RTL-related feedback to add on top of the previous points, we'd love to hear that @khaledhosny. Thank you!

julen commented 8 years ago

But if for some reason it is desired to translate the raw (not unescaped) string, then the translation also needs to be treated as a raw one, and translators will be following the escaping rules exposed in the source string.

So if I didn't miss the point @iafan, this would somehow mean having two editor modes that behave differently:

smart editing (default), where the custom font is used and the input text is mapped to the special characters defined there for output.
raw editing, where a monospace font is used and no transformations are applied on the input/output of characters. Text is treated as-is.

iafan commented 8 years ago

@julen, this is how these modes are supposed to work:

regular mode: we map only characters which are not related to directionality.
"raw" mode: we map all characters, force LTR and use monospaced font.

julen commented 8 years ago

After reading @dwaynebailey's comment it seems there was some confusion, so I've put up a live demo that showcases how the font would work. @iafan this already includes the last clarifications you made.

Some caveats aside (like copying text to external apps — which can be solved), I think I like the experience so far.

iafan commented 8 years ago

The demo looks good!

One minor amendment from me: we only want to convert spaces to dots in the Raw mode.

iafan commented 8 years ago

Not sure if it's the right time to report any issues with the editing (this is a demo, after all), but:

Ctrl+Space not only inserts space, but moves the caret to the end of the text, which is undesirable.
Pasting any text also moves the caret to the end of the text.
Copy-pasting of the LF character alone doesn't work.

khaledhosny commented 8 years ago

I just tested @julen’s live demo above and a simple two words Arabic string gets its words reordered because of the inserted bullet (word one to the left of word two, not to the right of it).

Generally, I don’t think replacing Unicode characters by other one will work well as far as BiDi algorithm is concerned unless the replacement character has the exact BiDi type of the one it is replacing.

julen commented 8 years ago

@khaledhosny the bullet should not be present in the regular editing mode as @iafan mentioned above. Besides that, I didn't add specify any dir in the demo (was in a rush and wanted to publish it), so probably these points affect proper RTL rendering. I updated the demo page to allow switching from LTR to RTL, it would be nice if you could check it out again.

Re. BiDi, have you tested that to confirm your hunch?

Thanks again!

khaledhosny commented 8 years ago

The base direction (the dir here) should have no effect on the case above, the order of the Arabic words should be the same. Even in “raw” mode, I expect the BiDi algorithm to be applied, otherwise it will be giving misleading results.

As for BiDi, the algorithm is very sensitive it is to character properties. Even characters that would appear to work in an identical way (e.g. an Arabic and a Hebrew letter) can behave differently under certain circumstances (they have different BiDi types for a reason).

Here are some specific issues I found:

raw mode:
- The RTL words order above.
- ZWJ (U+200D) has no effect on the joining behaviour, e.g. ه‍ه.
- RLM (U+200F) has no effect on the text order, e.g. ١‏٢.
- BiDi control chars (LRO, RLO, etc.) has no effect at all, e.g. ‮abcd‬.
regular mode:
- Tab screws up the text, e.g. عربي١ عربي٢.
- Ditto for NBSP, e.g. عربي ١ عربي ٢.

I’m not sure what other characters have special treatment to test.

julen commented 8 years ago

@khaledhosny some characters having no effect at all is a mistake from my side in the demo (they were being replaced with the new symbols, instead of being kept along with the symbols). I reckon this should be fixed and the characters should have their effect now.

julen commented 8 years ago

I’m not sure what other characters have special treatment to test.

The demo implements the following mapping extracted from @iafan's test files.

(Unicode) Meaning	Dec	Hex	Type	Strength	(Font) Symbol	Dec	Hex	BiDi
Null Character	0	0000	BN	Weak	NULL	57344	E000	?
Tabulation	9	0009	S	Neutral	TAB	57353	E009	?
Line Feed	10	000A	B	Neutral	LF	57354	E00A	?
Carriage Return	13	000D	B	Neutral	CR	57357	E00D	?
Escape	27	001B	BN	Weak	ESC	57371	E01B	?
Space	32	0020	WS	Neutral	SPACE*	57376	E020	?
Non-Breaking Space	160	00A0	CS	Weak	NBSP	57504	E0A0	?
Others
Zero-Width Space	8203	200B	BN	Weak	ZWS	61451	F00B	?
Zero-Width Non-Joiner	8204	200C	BN	Weak	ZWNJ	61452	F00C	?
Zero-Width Joiner	8205	200D	BN	Weak	ZWJ	61453	F00D	?
Left-to-Right Mark	8206	200E	L	Strong	LRM	61454	F00E	?
Right-to-Left Mark	8207	200F	R	Strong	RLM	61455	F00F	?
Left-to-Right Embedding	8234	202A	LRE		LRE	61482	F02A	?
Right-to-Left Embedding	8235	202B	RLE		RLE	61483	F02B	?
Pop-Directional Formatting	8236	202C	PDF		PDF	61484	F02C	?
Left-to-Right Override	8237	202D	LRO		LRO	61485	F02D	?
Right-to-Left Override	8238	202E	RLO		RLO	61486	F02E	?
Word Joiner	8288	2060	BN	Weak	WJ	61536	F060	?

Note: @dwaynebailey formatted table and added directionality data.

Legend:

B - Paragraph Separator
BN - Boundary Neutral
CS - Common Number Separator
L - Left-to-Right
LRE - Left-to-Right Embedding
LRO - Left-to-Right Override
PDF - Pop Directional Format
R - Right-to-Left
RLE - Right-to-Left Embedding
RLO - Right-to-Left Override
S - Segment Separator
WS - Whitespace

Also useful Wikipedia template for directionality classes

iafan commented 8 years ago

@khaledhosny did you also try the read-only rendering demo I provided some time ago? It should have RTL enabled.

iafan commented 8 years ago

One minor amendment from me: we only want to convert spaces to dots in the Raw mode.

@julen: and yet one more amendment: it would be absolutely fabulous if you could still render spaces as dots at the beginning or the end of the string (hanging spaces).

dwaynebailey commented 8 years ago

So it seems that the issue @khaledhosny has raised isn't being addressed in the PUA in that the replacements don't have the exact same bidi categories. If we had PUA matches that where in the same class as the ones we're substituting then it could work. But it seems that they are all bid neural or something similar.

So some options:

We see this as an LTR solution and do something else for bidi.
We use a font that changes the character as is and don't use PUA. I think once again I need reminding why we needed to use the PUA?

translate / pootle

Escaping backslash #3941

3350 is definitely something that comes to my mind. Here's the demo that implements the same rendering approach to non-printable symbols (since we're rendering this in the context of HTML, we don't need a special font here):