hypermail-project / hypermail

Hypermail is a free (GPL) program to convert email from Unix mbox format to html.
http://www.hypermail-project.org/
GNU General Public License v2.0
146 stars 71 forks source link

base64 decode assumes input lines are in multiples of 4 #96

Closed vandys closed 1 year ago

vandys commented 1 year ago

base64 gives 3 octets for 4 characters of input. There is no requirement that mail input needs to be in units of four, and in fact I have inputs where there are two residual in one line, and the final two on the next. hypermail invokes base64Decode per line, and this results in a corrupted decode.

I'm going to look at a struct to hold the decode state, so the decoder can be called with this as an argument and thus have stateful continuity from line to line.

jkbzh commented 1 year ago

Are you sure about this? It may be you're looking at a badly-formed message.

Check RFC2045. For MIME base-64 encoding, lines must be at most 76 characters long. It's up to the mail client to do that split.

I'd advice holding off this work and looking at the RFC and evaluating the UA that created the message you're talking about.

In particular this section:

(Soft Line Breaks) The Quoted-Printable encoding REQUIRES that encoded lines be no more than 76 characters long. If longer lines are to be encoded with the Quoted-Printable encoding, "soft" line breaks must be used. An equal sign as the last character on a encoded line indicates such a non-significant ("soft") line break in the encoded text.

As well as this note in base64.c

And finally try this base64 decoder to know your base64 is valid:

https://www.base64encode.org/

What I would be interested is in having a sample mail that has this issue so I can check out the issue you're describing.

vandys commented 1 year ago

Pretty sure... "no more than 76" makes 74 sound OK. Yes, the decoding wraps to the next line, but it reallly doesn't look like that's illegal. They didn't say "length modulo four must be zero". This is from a moderately big email service (Tutanota) so I feel like it'd be good to support it.

I have it mostly coded up. Just nailing down making sure the state gets freed at the end of the base64 decode.

jkbzh commented 1 year ago

Thanks for your report and PR

Fixed in https://github.com/hypermail-project/hypermail/commit/2bd8ed57721849554b054e8948d11722add3c7a0