Open kolesar-andras opened 4 years ago
Zombie driver fails when url contains "high bytes", non-ascii characters. The following example contains a valid Hungarian with accented characters.
https://hu.wikipedia.org/wiki/Műemlék
Desktop browsers and Mink Goutte driver translate the high bytes correctly:
https://hu.wikipedia.org/wiki/M%C5%B1eml%C3%A9k
Zombie driver sends string as-is to javascript, then bytes above 0x7f go wrong somewhere in Zombie:
0x7f
https://hu.wikipedia.org/wiki/Mqeml\xe9k
It's a bit strange how characters are truncated:
é
\xe9
ISO-8859-1
ű
q
Characters that don't exist in ISO-8859-1 encoding are represented with regular letters, for example q, damage is irreversible.
Example shows that desktop browsers translate non-asci characters to percent-encoded bytes using their UTF-8 character codes:
%C3%A9
%C5%B1
That's correct, web servers expect urls in this way.
Zombie driver fails when url contains "high bytes", non-ascii characters. The following example contains a valid Hungarian with accented characters.
Desktop browsers and Mink Goutte driver translate the high bytes correctly:
Zombie driver sends string as-is to javascript, then bytes above
0x7f
go wrong somewhere in Zombie:It's a bit strange how characters are truncated:
é
becomes\xe9
that is character code inISO-8859-1
ű
becomesq
because this character does not exists in that code pageCharacters that don't exist in
ISO-8859-1
encoding are represented with regular letters, for exampleq
, damage is irreversible.Example shows that desktop browsers translate non-asci characters to percent-encoded bytes using their UTF-8 character codes:
é
becomes%C3%A9
ű
becomes%C5%B1
That's correct, web servers expect urls in this way.