purebred-mua / purebred-email

A fast email parsing library implemented in Haskell
https://hackage.haskell.org/package/purebred-email
GNU Affero General Public License v3.0
23 stars 4 forks source link

`headerFrom` does partial character decoding #73

Closed divipp closed 2 years ago

divipp commented 2 years ago

When the email header From field has multiple encoded blocks, only the first block is decoded with headerFrom.

For example, given the following header fragment

From: =?utf-8?Q?V=C3=A1ros_=2D_Parkol=C3=A1si_=C3=BCgyint=C3=A9z?=
 =?utf-8?Q?=C3=A9s?= <x@y.z>

headerFrom returns

[Single (Mailbox (Just "V\225ros - Parkol\225si \252gyint\233z =?utf-8?Q?=C3=A9s?=") (AddrSpec "x" (DomainDotAtom ("y" :| ["z"]))))]

instead of

[Single (Mailbox (Just "V\225ros - Parkol\225si \252gyint\233z\233s") (AddrSpec "x" (DomainDotAtom ("y" :| ["z"]))))]

Notes

frasertweedale commented 2 years ago

@divipp thank you for your detailed bug report. Would you be able to test the following branch and provide feedback? https://github.com/purebred-mua/purebred-email/tree/fix/73-decode-folded-encoded-word

divipp commented 2 years ago

Hi, Thanks for the fast fix!

The example in the bug report is OK now. The space went away and the second part is decoded too. However the bugfix doesn't work with the following header fragment:

From: "Aegon =?UTF-8?Q?Biztos=C3=ADt=C3=B3?=" <x@y.z>

I think the result should be [Single (Mailbox (Just "Aegon Biztosító") (AddrSpec "x" (DomainDotAtom ("y" :| ["z"]))))] but it is [Single (Mailbox (Just "Aegon =?UTF-8?Q?Biztos=C3=ADt=C3=B3?=") (AddrSpec "x" (DomainDotAtom ("y" :| ["z"]))))] so the decoding is not done after the ASCII Aegon+space characters.

frasertweedale commented 2 years ago

@divipp thanks for the feedback. I believe for:

From: "Aegon =?UTF-8?Q?Biztos=C3=ADt=C3=B3?=" <x@y.z>

is correct. Per https://datatracker.ietf.org/doc/html/rfc2047#section-5:

   + An 'encoded-word' MUST NOT appear within a 'quoted-string'.

Without the quotes, it decodes properly:

λ> parse (addressList defaultCharsets) ("\"Aegon =?UTF-8?Q?Biztos=C3=ADt=C3=B3?=\" <x@y.z>" :: B.ByteString)
Right [Single (Mailbox (Just "Aegon =?UTF-8?Q?Biztos=C3=ADt=C3=B3?=") (AddrSpec "x" (DomainDotAtom ("y" :| ["z"]))))]
λ> parse (addressList defaultCharsets) ("Aegon =?UTF-8?Q?Biztos=C3=ADt=C3=B3?= <x@y.z>" :: B.ByteString)
Right [Single (Mailbox (Just "Aegon Biztos\237t\243") (AddrSpec "x" (DomainDotAtom ("y" :| ["z"]))))]

I will implement a test for the feature and include the fix in the next release.

divipp commented 2 years ago

OK, thanks! Maybe I'll use a lenient headerFrom in my application, because in Hungary there are lots encoded words in quoted addresses. :(

frasertweedale commented 2 years ago

@divipp That's unfortunate. Consider reporting as bug(s) against whatever program(s) are producing those messages.

frasertweedale commented 2 years ago

Fix released in v0.5.1: https://hackage.haskell.org/package/purebred-email-0.5.1