Decode buffer size reports incorrectly

synesissoftware / b64

Base-64 Conversion Library

Other

6 stars 5 forks source link

Decode buffer size reports incorrectly #5

Open gerry-hornbill opened 1 week ago

gerry-hornbill commented 1 week ago

Given the following string, base64 encoded...

"The string to encode" -> "VGhlIHN0cmluZyB0byBlbmNvZGU="

When decoded this will decode to exactly 20 characters if you call b64::decode2() but, if you call b64::decode2() without providing a buffer, it will report it requires 21 bytes of buffer space. This is a problem when working with buffers and binary data where you have no option to reset the buffer length after allocating the memory, so you need to allocate the exact amount of memory required, before decoding, and have the decoder decode directly into your target buffer. In the above example, I would make a determination of expected exact length, then create the buffer, and then call b64::decode2 to decode directly into the buffer. My workaround is to tell the decode2() function that the target buffer is 1 byte bigger than it really is, but that feels like a terrible hack. Its not really clear why the library does this, it seems to be a defect from what I can see.

gerry-hornbill commented 1 week ago

I have submitted a proposed fix which I hope will be of some use

synesissoftware commented 1 week ago

Thanks for reporting. We'll check it out and get back to you.

gerry-hornbill commented 1 week ago

There is one other thing I found in relation to my proposed change. The b64::_decode function allow for the input base64 string to be nullptr, in this case, my proposed change would crash, I further resolved that with a simple check for this condition, and in the case where this param is null, I just reverted to the old behaviour, I will make that change and make a second pull reuest

synesissoftware commented 1 week ago

Your observation is spot-on, and your contribution gratefully merged into v1.5.3.

One thing: I've not included the padding overruns for now, mainly for performance reasons. I was thinking we might add another parsing flag to make that available but opt-in. What do you think of that idea?

gerry-hornbill commented 1 week ago

To be honest, its such an edge case, you could skip it completely. However, if you ingest the base64 string '====' or 'A===' that is malformed anyway, so its just there to deal with that edge case. I run software in production that does millions of API calls an hour around the clock, you learn to always expect the unexpected. I found that under run/overrun problem just doing some basic testing, I liked the idea that even with junk input you would get a workable result from the decoder.

I think adding a parsing flag would work, but to be honest with you, the overhead of checking to see if the flag is set will be almost the same as performing the check, the data and the code will very likely be in the L1 cache in most cases on most systems where performance will matter, so I personally would not think it that big a problem from a performance standpoint.

Thanks for accepting the pull request and including the fix Gerry