nickboucher / trojan-source

Trojan Source: Invisible Vulnerabilities
https://trojansource.codes
MIT License
1.24k stars 266 forks source link

Doubt regarding early-return.py example #22

Closed ju-sh closed 2 years ago

ju-sh commented 2 years ago

In the early-return.py example, I had thought

then <RLI> ''' ;return

would end up being rendered as

then nruter; '''

where every character from the end of line (there's an implicit <PDI> at end of line, right? Or doesn't this end of line not count as an end of paragraph?) to the <RLI> are displayed one by one.

But I suppose that's not how it works since it gets displayed as

then return; '''

Could someone help me understand this?

ju-sh commented 2 years ago

Found an example here that says:

Memory:  he said "<RLI>I NEED WATER!<PDI>", and expired.
Display: he said "!RETAW DEEN I", and expired.

But does it work differently in the early-return example?

nickboucher commented 2 years ago

This is a great point -- the RLI character behaves strangely. The example that you reference comes from the de facto source of truth for a correct Unicode implementation, but that's not the behavior that I observe in practice.

Unicode Bidi implementations differ by application & OS, but rendering "I NEED WATER!" in Chromium-based browsers for me yields something that looks like "!I NEED WATER". For comparison, here's the encoded form for anyone reading to observe in their own browser:

⁧I NEED WATER!⁩

Experimentally, it appears to me that (at least in Chromium-based software) RLI affects neutral characters but not strong characters; or, rather, at least affects neutral and strong characters differently.

It's not immediately clear to me whether this behavior is in the Unicode Bidi spec, or whether this is a bug/undefined behavior in specific Bidi implementations.

ju-sh commented 2 years ago

Thanks. I think in the example from the page for the bidi algorithm, they also use upper case characters as a kind of notation to represent a script written from right to left (or something like that, right?).

And do you know how is bidi pronounced? Is it 'bye-dye' or 'bee-dee'?

nickboucher commented 2 years ago

I've heard it pronounced many different ways; I'm not sure that there's a consensus. Although it's a different "Bidirectional" there's a related thread about it here.