jordan-wright / email

Robust and flexible email library for Go
MIT License
2.61k stars 324 forks source link

Adjusting trimReader to only trim whitespace on the first read. #122

Closed jordan-wright closed 3 years ago

jordan-wright commented 3 years ago

To make email parsing via NewEmailFromReader more resilient, we wrap the provided reader in a trimReader, which is a little custom reader that tries to trim leading whitespace. However, there was a bug where it tried to delete whitespace on every read, not just the first one.

In very rare cases, this would cause email parsing to fail since the next set of bytes to be read would include the \n part of a \r\n. This PR fixes this by ensuring that leading whitespace is only removed during the first read.

Fixes #106

jordan-wright commented 3 years ago

@glennzw, @leucos can y'all take this for a spin and let me know if it resolves the issue for you? Local testing appears promising.

leucos commented 3 years ago

As a side effect, your PR also fixes other errors I had (although less frequent) in my test corpus:

$ for i in corpus/*; do ./bin/eml-read $i 2>&1 | cut -f1,5- -d' '; done
      1 ERROR mime: invalid media parameter
      4 ERROR malformed MIME header: missing colon: %!q(MISSING)
     14 ERROR quotedprintable: invalid hex byte 0x0d

This looks good ! Thanks !

jordan-wright commented 3 years ago

@leucos awesome, thanks for validating! Out of curiosity, is that corpus output showing errors that were fixed, or errors that are still occurring? Only asking, since if you are still seeing the quotedprintable bug I'd love an email sample 😄

In the meantime I'll go ahead and get this merged.

leucos commented 3 years ago

@jordan-wright no no everything is fixed now. I was not very worried by those and I did not mention them in the first place since there was not a lot of them .

With your changes I have zero errors in the corpus (which is not that big; ~ 350 emails).

If you're still curious I can send the emails ofc.

Thanks a lot for you work on fixing this :+1:

glennzw commented 3 years ago

@glennzw, @leucos can y'all take this for a spin and let me know if it resolves the issue for you? Local testing appears promising.

LGTM! It fixes my test case:

Before update:

go run test_crlf.go bad_fa25fc7fd80e89fe6531c535f4eb68dd.eml 
[+] Reading 'bad_fa25fc7fd80e89fe6531c535f4eb68dd.eml'
[+] Email details:

[!] Unable to read email with Jordan's library: 
2020/08/25 11:22:25 quotedprintable: invalid hex byte 0x0d
exit status 1

After update:

go run test_crlf.go bad_fa25fc7fd80e89fe6531c535f4eb68dd.eml
[+] Reading 'bad_fa25fc7fd80e89fe6531c535f4eb68dd.eml'
[+] Email details:

    From: Google <noreply-utos@google.com>
    To: [safety.check.9001@gmail.com]
    Subject: Find out more about our updated Terms of Service
    HTML size: 20400
    Text size: 3936