sifive / wake

The SiFive wake build tool
Other
86 stars 28 forks source link

rsc: Cleanup TODOs #1577

Closed V-FEXrt closed 3 months ago

V-FEXrt commented 3 months ago

Cleans up TODOS, updates the strToBytes function to use byteToInteger which shouldn't break serialization on non-ASCII strings

ag-eitilt commented 3 months ago

I'm a bit concerned about byteToInteger -- or maybe the problem's in the deserialization?

$ wake -x 'explode "[aɪ̯ pʰiː eɪ̯]" | map byteToInteger | map integerToByte'
"[", "a", "�", "�", " ", "p", "�", "i", "�", " ", "e", "�", "�", "]", Nil
$ wake -x 'explode "[aɪ̯ pʰiː eɪ̯]" | map unicodeToInteger | map integerToUnicode'
"[", "a", "ɪ", "̯", " ", "p", "ʰ", "i", "ː", " ", "e", "ɪ", "̯", "]", Nil

If that's a false negative, then it's all good, but I think we might need those single-quote raw strings that got simplified away long ago for explode to give you the expected bytes.

V-FEXrt commented 3 months ago

You are right to be concerned, I was/am too. I tested this specific case using various emoji but I guess I didn't do a deep enough test.

There isn't really a good solution here. I have to send a stream of bytes to the server and unicodeToInteger will very happily return values larger than a byte.

$ wake -x 'explode "[aɪ̯ pʰiː eɪ̯]" | map unicodeToInteger'
91, 97, 618, 815, 32, 112, 688, 105, 720, 32, 101, 618, 815, 93, Nil
            ^^^^^^^^ -- these don't fit in a byte

which will break the RSC. We might be able to use it if we could find the largest return value from unicodeToInteger and update the RSC to use that type

Maybe if we can figure out the largest possible value that unicodeToInteger will return but thats

V-FEXrt commented 3 months ago

Okay, I found the max size and what I need to change everything too. I'd like to do that is a separate PR after this one lands. This PR is a little weird to land then immediately change to something else but the PR brings the world into a more consistent state (that is equally broken as the previous state)

V-FEXrt commented 3 months ago

Ah actually the larger integer type doesn't work since we are just streaming bytes to files. We have to make this work using just a byte stream.

From an offline chat with Sam, turns out the issue is actually that explode works on code points (which makes sense) so that is the core problem.

Wake is perfectly happy to reconstruct a UTF8 string from the cat of a bunch of invalid UTF8 strings.

$ wake -x '((integerToByte 0xc3), (integerToByte 0xa7), Nil) | cat'
"ç"