Open Uzlopak opened 1 year ago
All good ideas/pointers.
It would be possible to design a base64 library specifically for the needs of Node.js. We could throw in base16 (hex) and so forth. Handling spaces efficiently is possible.
The base64 decoder is robust with respect to spaces but it seems to ignore any non-base64 character, actually...
See what the specification says...
Whitespace characters such as spaces, tabs, and new lines contained within the base64-encoded string are ignored.
But look...
> Buffer.from(' \(\(AA\(\\AA','base64')
<Buffer 00 00 00>
The hex decoder seems to stop on the first non-hex character:
> Buffer.from(' \(\(AA\(\\AA','hex')
<Buffer >
> Buffer.from('AAAA','hex')
<Buffer aa aa>
This seems documented:
Data truncation may occur when decoding strings that do not exclusively consist of an even number of hexadecimal characters
See what the specification says...
What specification? AFAIK Buffer
is a Node.js API, the only "specification" would be Node.js docs.
What specification? AFAIK Buffer is a Node.js API, the only "specification" would be Node.js docs.
Yes. I quoted the documentation.
Base64 support is coming soon in simdutf: https://github.com/simdutf/simdutf/pull/375
atob performance has been greatly improved by @anonrig https://github.com/nodejs/node/pull/52381
So this handles part of the issue.
@anonrig is handling part of the rest of the issue in https://github.com/nodejs/node/pull/52428
In the last few days I was investigating the performance of hex and especially base64 and base64url
Added benchmarks https://github.com/nodejs/node/pull/50348
base64 encoding is using the functionality from the base64 dependency.
base64 decoding is not using the functionality from the base64 dependency. We have a custom implementation, which handles the base64 decoding gracefully. So a whitespace does not result in an error but gets ignored.
base64url encoding is a custom implementation. So it is slower than it could be.
base64url decoding is a custom implementation. So it is slower than it could be.
hex encoding is a custom implementation. So it is slower than it could be.
hex decoding is a custom implementation. So it is slower than it could be.
Maybe this is something to be implemented in simdutf?
@lemire @anonrig