nodejs / performance

Node.js team focusing on performance
MIT License
377 stars 7 forks source link

performance of encodings (hex, base64, base64url) #128

Open Uzlopak opened 1 year ago

Uzlopak commented 1 year ago

In the last few days I was investigating the performance of hex and especially base64 and base64url

Added benchmarks https://github.com/nodejs/node/pull/50348

base64 encoding is using the functionality from the base64 dependency.

base64 decoding is not using the functionality from the base64 dependency. We have a custom implementation, which handles the base64 decoding gracefully. So a whitespace does not result in an error but gets ignored.

base64url encoding is a custom implementation. So it is slower than it could be.

base64url decoding is a custom implementation. So it is slower than it could be.

hex encoding is a custom implementation. So it is slower than it could be.

hex decoding is a custom implementation. So it is slower than it could be.

Maybe this is something to be implemented in simdutf?

@lemire @anonrig

lemire commented 1 year ago

All good ideas/pointers.

It would be possible to design a base64 library specifically for the needs of Node.js. We could throw in base16 (hex) and so forth. Handling spaces efficiently is possible.

lemire commented 11 months ago

The base64 decoder is robust with respect to spaces but it seems to ignore any non-base64 character, actually...

See what the specification says...

Whitespace characters such as spaces, tabs, and new lines contained within the base64-encoded string are ignored.

But look...

> Buffer.from(' \(\(AA\(\\AA','base64')
<Buffer 00 00 00>

The hex decoder seems to stop on the first non-hex character:

> Buffer.from(' \(\(AA\(\\AA','hex')
<Buffer >
> Buffer.from('AAAA','hex')
<Buffer aa aa>

This seems documented:

Data truncation may occur when decoding strings that do not exclusively consist of an even number of hexadecimal characters

aduh95 commented 11 months ago

See what the specification says...

What specification? AFAIK Buffer is a Node.js API, the only "specification" would be Node.js docs.

lemire commented 11 months ago

What specification? AFAIK Buffer is a Node.js API, the only "specification" would be Node.js docs.

Yes. I quoted the documentation.

lemire commented 8 months ago

Base64 support is coming soon in simdutf: https://github.com/simdutf/simdutf/pull/375

lemire commented 7 months ago

atob performance has been greatly improved by @anonrig https://github.com/nodejs/node/pull/52381

So this handles part of the issue.

lemire commented 7 months ago

@anonrig is handling part of the rest of the issue in https://github.com/nodejs/node/pull/52428