Open BlackthornYugen opened 5 months ago
for charset in US-ASCII ISO-8859-1 Windows-1252 UTF-8 UTF-16 UTF-32 ; do
curl --write-out '%{stderr}\n%{url_effective}\n' --silent "https://httpbin.jskw.dev/encoding/${charset}/$(echo 'Hello World' | iconv -s -f 'utf-8' -t ${charset} 2> /dev/null | base64 | tr '/+' '_-')" | xxd
done
https://httpbin.jskw.dev/encoding/US-ASCII/SGVsbG8gV29ybGQK
00000000: 4865 6c6c 6f20 576f 726c 640a Hello World.
https://httpbin.jskw.dev/encoding/ISO-8859-1/SGVsbG8gV29ybGQK
00000000: 4865 6c6c 6f20 576f 726c 640a Hello World.
https://httpbin.jskw.dev/encoding/Windows-1252/SGVsbG8gV29ybGQK
00000000: 4865 6c6c 6f20 576f 726c 640a Hello World.
https://httpbin.jskw.dev/encoding/UTF-8/SGVsbG8gV29ybGQK
00000000: 4865 6c6c 6f20 576f 726c 640a Hello World.
https://httpbin.jskw.dev/encoding/UTF-16/_v8ASABlAGwAbABvACAAVwBvAHIAbABkAAo=
00000000: feff 0048 0065 006c 006c 006f 0020 0057 ...H.e.l.l.o. .W
00000010: 006f 0072 006c 0064 000a .o.r.l.d..
https://httpbin.jskw.dev/encoding/UTF-32/AAD-_wAAAEgAAABlAAAAbAAAAGwAAABvAAAAIAAAAFcAAABvAAAAcgAAAGwAAABkAAAACg==
00000000: 0000 feff 0000 0048 0000 0065 0000 006c .......H...e...l
00000010: 0000 006c 0000 006f 0000 0020 0000 0057 ...l...o... ...W
00000020: 0000 006f 0000 0072 0000 006c 0000 0064 ...o...r...l...d
00000030: 0000 000a ....
New endpoints:
/encoding/<charset>
/encoding/<charset>/<base64body>
You could generate some url safe encodings like this:
That script generates these URLs:
And the browser (or any client) should be able to render the same chars, with the exception of the unicode specific '🎉'.
The existing endpoint https://httpbin.dev/encoding/utf8 continues to work the same with this change at https://httpbin.jskw.dev/encoding/utf8 -- and I also have it re-encode the demo into utf16 and utf32. It's interesting that Firefox does a better job at rendering utf16 but both Chrome and Firefox don't do utf32 at all.
To be fair the utility of utf16 and utf32 doesn't really make sense in the browser, it would only really make sense on specialized clients that need to be able to seek through unicode data without variable length encoding of the larger codepoints.
I've found that Firefox does pretty well with both utf-8 and utf-16 but not utf-32. Chrome makes a bit of a mess of the utf-8 demo re-encoded to utf-8.