mtth / avsc

Avro for JavaScript :zap:
MIT License
1.29k stars 148 forks source link

Replace Buffer with Uint8Array #452

Closed valadaptive closed 2 months ago

valadaptive commented 9 months ago

I still need to do some cleanups and replace uses of Buffer.compare, but I'm putting this PR here so you can benchmark it. Before I can complete this, #428 (minus the version bump) should be merged in, as the services code uses Buffer everywhere and expects fixed/bytes types to decode to Buffers.

I've made some tweaks to the benchmarking setup, so comparing this directly to the master branch won't work:

I've cherry-picked those benchmarking changes into the bench-tweaks branch, which you can use to compare benchmarks.

mtth commented 8 months ago

Thanks @valadaptive! I'll try to find time to merge #428.

mtth commented 8 months ago

FYI @valadaptive - #428 is in.

valadaptive commented 8 months ago

Working on removing Buffer usage from types.js now. I noticed the isJsonBuffer function, which seems to check if a given object is the JSON representation of a Buffer. Under what circumstances are Avro types directly serialized to JSON and/or parsed back directly? I can't easily polyfill Uint8Array to stringify to a regular array, so I'll probably need to insert a fixup step when stringifying/parsing.

valadaptive commented 2 months ago

@mtth Do you intend to support the ability to roundtrip various Avro types to/from JSON? With the current representation that uses Buffers, this works mostly fine, but with Uint8Array, the JSON representation is a lot more bloated:

> JSON.stringify(Buffer.from([1, 2, 3, 4, 5]))
'{"type":"Buffer","data":[1,2,3,4,5]}'
> JSON.stringify(new Uint8Array([1, 2, 3, 4, 5]))
'{"0":1,"1":2,"2":3,"3":4,"4":5}'

I lean towards removing the coerceBuffers option entirely to discourage people from serializing Uint8Arrays to JSON.

mtth commented 2 months ago

Do you intend to support the ability to roundtrip various Avro types to/from JSON?

It's nice to have, but I'm OK dropping coerceBuffers if it adds significant complexity.

valadaptive commented 2 months ago

It's nice to have, but I'm OK dropping coerceBuffers if it adds significant complexity.

I believe it works right now in terms of recognizing Buffers. However, since we now use Uint8Arrays instead of Buffers in types, they'll serialize to much larger objects in JSON (e.g. Uint8Array([1, 2, 3, 4, 5] is serialized as '{"0":1,"1":2,"2":3,"3":4,"4":5}').

I can either try to implement code to recognize JSON'd Uint8Arrays, or remove coerceBuffers entirely. Leaving it as-is would be a huge footgun, because you can no longer round-trip Avro types through JSON (it'll serialize Uint8Arrays to a JSON representation it cannot itself recognize as a Uint8Array).

joscha commented 2 months ago

Short reference to https://gist.github.com/joscha/d8603a1f0af5b0b055546c792b2a8ff6, which can be updated once this pull request lands.

joscha commented 1 month ago

I've noticed that this change breaks types in https://github.com/mtth/avsc/blob/5db11655adf06b24b70d304840480962996f1b3a/types/index.d.ts#L77 for example. Before I update the types, is it safe to assume that all occurrences of Buffer should become Uint8Array?

refs #128 and #479

mtth commented 1 month ago

is it safe to assume that all occurrences of Buffer should become Uint8Array

Yes.

joscha commented 1 month ago

is it safe to assume that all occurrences of Buffer should become Uint8Array

Yes.

Great, see https://github.com/mtth/avsc/pull/488