jumaris / indyproject

Automatically exported from code.google.com/p/indyproject
0 stars 0 forks source link

DecodeHeader bug #205

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
DecodeHeader('=?utf-8?B?
0JDQn9CV0JrQodCe0KTQojog0JfQsNC60LDQtyDQv9GA0LXQ?= =?utf-8?
B?t9C10L3RgtCw0YbQuNC4?=')

Expected:
"АПЕКСОФТ: Заказ презентации"

Actual:
"АПЕКСОФТ: Заказ пре??ентации"

Original issue reported on code.google.com by gambit47 on 11 Nov 2011 at 5:33

GoogleCodeExporter commented 9 years ago
Here is another example of a header that does not decode properly:

Subject: =?UTF-8?Q?Sape.ru: 
=D0=9D=D0=BE=D0=B2=D0=BE=D1=81=D1=82=D0=BD=D0=B0=D1=8F 
=D1=80=D0=B0=D1=81=D1=81=D1=8B=D0=BB=D0=BA=D0=B0 =E2=84=9611?=.

Original comment by gambit47 on 17 Nov 2011 at 10:14

GoogleCodeExporter commented 9 years ago
Both samples are due to faulty encoded data.  The first sample contains a UTF-8 
encoded character that is being split between the two MIME encoded-words, which 
violated RFC 2047.  The second sample contains unencoded whitespace, which is 
forbidden by RFC 2047.

The above samples was originally tested with Delphi 7.  In D2009+, the first 
sample produces a completely blank string instead.  This is because 
Embarcadero's SysUtils.TUTF8Encoding class uses the MB_ERR_INVALID_CHARS flag 
when calling MultiByteToWideChar(), which fails because of the split character 
octets.  In Delphi 7, ndy uses its own TIdUTF8Encoding class that does not use 
the MB_ERR_INVALID_CHARS flag.  Indy's parser needs to be updated to use 
TIdUTF8Encoding in all Delphi versions.

Original comment by gambit47 on 24 Nov 2011 at 1:35

GoogleCodeExporter commented 9 years ago
Rev 4900 updates the IdHeaderCoderIndy.pas unit to use the CharsetToEncoding() 
function, which uses TIdUTF8Encoding for UTF-8.  This allows the first example 
to no longer return a blank string on failure, though the output will not be 
100% correct because of the split codeunits.

The second example still fails, because DecodeHeader() validates whitespace 
while extracting the MIME encoding, so it does not detect that the data is 
encoded and skips it.

Original comment by gambit47 on 29 Dec 2012 at 7:57