Closed jordan-wright closed 3 years ago
@glennzw, @leucos can y'all take this for a spin and let me know if it resolves the issue for you? Local testing appears promising.
github.com/jordan-wright/email v0.0.0-20200602115436-fd8a7622303e
=> ERROR (as expected)github.com/jordan-wright/email v4.0.1-0.20200824024835-1c5046cdad27+incompatible
=> ERRORgithub.com/jordan-wright/email v0.0.0-20200824022917-3f4aaffe682c
(PR) => OKAs a side effect, your PR also fixes other errors I had (although less frequent) in my test corpus:
$ for i in corpus/*; do ./bin/eml-read $i 2>&1 | cut -f1,5- -d' '; done
1 ERROR mime: invalid media parameter
4 ERROR malformed MIME header: missing colon: %!q(MISSING)
14 ERROR quotedprintable: invalid hex byte 0x0d
This looks good ! Thanks !
@leucos awesome, thanks for validating! Out of curiosity, is that corpus output showing errors that were fixed, or errors that are still occurring? Only asking, since if you are still seeing the quotedprintable bug I'd love an email sample 😄
In the meantime I'll go ahead and get this merged.
@jordan-wright no no everything is fixed now. I was not very worried by those and I did not mention them in the first place since there was not a lot of them .
With your changes I have zero errors in the corpus (which is not that big; ~ 350 emails).
If you're still curious I can send the emails ofc.
Thanks a lot for you work on fixing this :+1:
@glennzw, @leucos can y'all take this for a spin and let me know if it resolves the issue for you? Local testing appears promising.
LGTM! It fixes my test case:
Before update:
go run test_crlf.go bad_fa25fc7fd80e89fe6531c535f4eb68dd.eml
[+] Reading 'bad_fa25fc7fd80e89fe6531c535f4eb68dd.eml'
[+] Email details:
[!] Unable to read email with Jordan's library:
2020/08/25 11:22:25 quotedprintable: invalid hex byte 0x0d
exit status 1
After update:
go run test_crlf.go bad_fa25fc7fd80e89fe6531c535f4eb68dd.eml
[+] Reading 'bad_fa25fc7fd80e89fe6531c535f4eb68dd.eml'
[+] Email details:
From: Google <noreply-utos@google.com>
To: [safety.check.9001@gmail.com]
Subject: Find out more about our updated Terms of Service
HTML size: 20400
Text size: 3936
To make email parsing via
NewEmailFromReader
more resilient, we wrap the provided reader in atrimReader
, which is a little custom reader that tries to trim leading whitespace. However, there was a bug where it tried to delete whitespace on every read, not just the first one.In very rare cases, this would cause email parsing to fail since the next set of bytes to be read would include the
\n
part of a\r\n
. This PR fixes this by ensuring that leading whitespace is only removed during the first read.Fixes #106