Open GoogleCodeExporter opened 9 years ago
Here is another example of a header that does not decode properly:
Subject: =?UTF-8?Q?Sape.ru:
=D0=9D=D0=BE=D0=B2=D0=BE=D1=81=D1=82=D0=BD=D0=B0=D1=8F
=D1=80=D0=B0=D1=81=D1=81=D1=8B=D0=BB=D0=BA=D0=B0 =E2=84=9611?=.
Original comment by gambit47
on 17 Nov 2011 at 10:14
Both samples are due to faulty encoded data. The first sample contains a UTF-8
encoded character that is being split between the two MIME encoded-words, which
violated RFC 2047. The second sample contains unencoded whitespace, which is
forbidden by RFC 2047.
The above samples was originally tested with Delphi 7. In D2009+, the first
sample produces a completely blank string instead. This is because
Embarcadero's SysUtils.TUTF8Encoding class uses the MB_ERR_INVALID_CHARS flag
when calling MultiByteToWideChar(), which fails because of the split character
octets. In Delphi 7, ndy uses its own TIdUTF8Encoding class that does not use
the MB_ERR_INVALID_CHARS flag. Indy's parser needs to be updated to use
TIdUTF8Encoding in all Delphi versions.
Original comment by gambit47
on 24 Nov 2011 at 1:35
Rev 4900 updates the IdHeaderCoderIndy.pas unit to use the CharsetToEncoding()
function, which uses TIdUTF8Encoding for UTF-8. This allows the first example
to no longer return a blank string on failure, though the output will not be
100% correct because of the split codeunits.
The second example still fails, because DecodeHeader() validates whitespace
while extracting the MIME encoding, so it does not detect that the data is
encoded and skips it.
Original comment by gambit47
on 29 Dec 2012 at 7:57
Original issue reported on code.google.com by
gambit47
on 11 Nov 2011 at 5:33