minkphp / MinkZombieDriver

Zombie.js driver for Mink framework
41 stars 49 forks source link

Handle non-ascii characters in url #193

Open kolesar-andras opened 4 years ago

kolesar-andras commented 4 years ago

Zombie driver fails when url contains "high bytes", non-ascii characters. The following example contains a valid Hungarian with accented characters.

https://hu.wikipedia.org/wiki/Műemlék

Desktop browsers and Mink Goutte driver translate the high bytes correctly:

https://hu.wikipedia.org/wiki/M%C5%B1eml%C3%A9k

Zombie driver sends string as-is to javascript, then bytes above 0x7f go wrong somewhere in Zombie:

https://hu.wikipedia.org/wiki/Mqeml\xe9k

It's a bit strange how characters are truncated:

Characters that don't exist in ISO-8859-1 encoding are represented with regular letters, for example q, damage is irreversible.

Example shows that desktop browsers translate non-asci characters to percent-encoded bytes using their UTF-8 character codes:

That's correct, web servers expect urls in this way.