HeaderParseError on some messages

GoogleCodeExporter commented 9 years ago

I got HeaderParseError on listing INBOX folder (/mail/FOLDER_SU5CT1g=/) or 
watching message (/mail/FOLDER_SU5CT1g=/3/) for YANDEX.RU imap.
(login:password -- webpymail:qweasdzxc) 

settings:
[yandex.ru]
name = Yandex
host = imap.yandex.ru
port = 993
ssl  = true

The error is in decoding `subject` field in Python lib. But myabe it could be 
fixed.
It hapend when I sent message from Yandex webinterface to the same email box.

Original issue reported on code.google.com by akimov.alex on 20 Sep 2010 at 1:50

GoogleCodeExporter commented 9 years ago

We're having problems with the subject:

Subject: 
=?koi8-r?B?88/Pwt3FzsnFINMgz97FztggxMzJzs7ZzSDawcfPzM/Xy8/NLi4gySDLz9LP1MvJzSDUx
czPzSDTz8/C?=
        =?koi8-r?B?3cXOydEu?=

The IMAP server at imap.yandex.ru puts this in a single line (when we fetch the 
ENVELOPE):

header = 
'=?koi8-r?B?88/Pwt3FzsnFINMgz97FztggxMzJzs7ZzSDawcfPzM/Xy8/NLi4gySDLz9LP1MvJzSDU
xczPzSDTz8/C?==?koi8-r?B?3cXOydEu?='

If we try to decode the header we get:

>>> from email.header import decode_header
>>> decode_header(header)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/email/header.py", line 101, in decode_header
    raise HeaderParseError
email.errors.HeaderParseError

However if we take line by line it will decode just fine to:

[('\xf3\xcf\xcf\xc2\xdd\xc5\xce\xc9\xc5 \xd3 \xcf\xde\xc5\xce\xd8 
\xc4\xcc\xc9\xce\xce\xd9\xcd \xda\xc1\xc7\xcf\xcc\xcf\xd7\xcb\xcf\xcd.. \xc9 
\xcb\xcf\xd2\xcf\xd4\xcb\xc9\xcd \xd4\xc5\xcc\xcf\xcd \xd3\xcf\xcf\xc2', 
'koi8-r')]

Сообщение с очень длинным заголовком.. и 
коротким телом сооб

Original comment by hguerreiro@gmail.com on 20 Sep 2010 at 10:46

GoogleCodeExporter commented 9 years ago

We are missing a space or tab between '...SDTz8/C?=' and '=?koi8-...', if we 
insert the space it works ok. It will decode to: "Сообщение с 
очень длинным заголовком.. и коротким телом 
сообщения."

Upon a bit of research I think this is an IMAP server bug, the unfolding should 
be done according to RFC5322 section 2.2.3 
(http://tools.ietf.org/html/rfc5322#section-2.2.3) which states that the 
unfolding is done "by simply removing any CRLF that is immediately followed by 
WSP". Where WSP are 'white space characters': space ASCII 32 and horizontal tab 
ASCII 9.

If we fetch the message headers, they are properly formed:

Subject: 
=?koi8-r?B?88/Pwt3FzsnFINMgz97FztggxMzJzs7ZzSDawcfPzM/Xy8/NLi4gySDLz9LP1MvJzSDUx
czPzSDTz8/C?=\r\n\t=?koi8-r?B?3cXOydEu?=\r\n

With an horizontal tab after the CRLF: '...?=\r\n\t=?...'

However the IMAP server erases the tab when it does the unfolding. Because of 
that we have the error above.

I suggest that you file a bug report with the IMAP server maker. 

I'm going to try to find a workaround to this problem without impacting the 
well behaved servers and the performance of the ENVELOPE parser... This is 
critical because, for instance in the Google IMAP server, we don't have the 
SORT extension, because of that we are forced to fetch all the envelopes and 
then do the sorting client side. Reading the envelopes must be fast.

Original comment by hguerreiro@gmail.com on 20 Sep 2010 at 11:36

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

For now I can tell you that Thunderbird somehow store this message subj 
following  way:

Subject: 
=?koi8-r?B?88/Pwt3FzsnFINMgz97FztggxMzJzs7ZzSDawcfPzM/Xy8/NLi4gySDLz9LP1MvJzSDUx
czPzSDTz8/C?=
    =?koi8-r?B?3cXOydEu?=

I want to say that either it gets another response or it decode it someway out 
of the box. I'll try to talk to Yandex support to find out about this.

+ I think thats why some subjects are looked cutted (using Yandex imap + log 
subjects with Russian letters)

Original comment by akimov.alex on 21 Sep 2010 at 8:37

GoogleCodeExporter commented 9 years ago

This issue was closed by revision r87.

Original comment by hguerreiro@gmail.com on 21 Sep 2010 at 11:22

Changed state: Fixed

GoogleCodeExporter commented 9 years ago

While trying to understand this problem I noticed that the Yandex server does 
not return the full part 1 text from the message (note that the closing strong 
tag is truncated):

LDDO006 UID FETCH 6 BODY[1]
* 6 FETCH (UID 6 BODY[1] {85}
<strong>Test1</strong>
<br/>
<strong>=D0=A2=D0=B5=D1=81=D1=822</stron
)
LDDO006 OK FETCH completed

This is the envelope returned by Yandex:

LDDO006 UID FETCH 6 ENVELOPE
* 6 FETCH (UID 6 ENVELOPE ("Fri, 24 Sep 2010 08:34:42 +0000" 
"=?utf-8?q?Cyrilic_Subj_+_HTML_special_chars_+_tags_in_PLAIN=2E_?==?utf-8?b?0JfQ
sNCz0L7Qu9C+0LLQvtC6INC90LAg0JrQuNGA0LjQu9C40YbQtSAh?==?utf-8?b?QCMkJV4mKg==?=" 
(("" NIL "webpymail" "yandex.ru")) (("" NIL "webpymail" "yandex.ru")) (("" NIL 
"webpymail" "yandex.ru")) ((NIL NIL "webpymail" "gmail.com")) NIL NIL NIL NIL))
LDDO006 OK FETCH completed

***This is a bug of Yandex*** 

I'm going to think about this. The right thing is getting answers according to 
the RFC from the server, we can't possibly account for all the buggy servers in 
the world.

Original comment by hguerreiro@gmail.com on 25 Sep 2010 at 11:44

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

Yes, youre rigth about all buggy servers. I dont think that doing all things 
work on Yandex is main goal for you. But testing on Yandex could help to find 
out some not server but webpymail bugs. And also can show you, where happens 
500 errors, to prevent showing them to user. But if you don't think so I could 
stop testing on Yandex.
I'll send info about this error to Yandex.

Original comment by akimov.alex on 26 Sep 2010 at 5:52

GoogleCodeExporter commented 9 years ago

No! By all means continue to test in Yandex. I agree with you, this is a good 
way to find bugs in wepymail and to have its structure more flexible. It's also 
a good way to annoy the Yandex maintainers :-)

I think the best way to deal with this problem is to implement a "quirks mode" 
just like we have in the browsers, where we can account for this "quirkiness" 
(and others). 

However this kind of things should be done only when we have the library 
structure more or less stable. 

I'm marking this a wish to be done later.

Original comment by hguerreiro@gmail.com on 27 Sep 2010 at 8:10

Added labels: Quirks-Mode, Wish
Removed labels: Priority-Medium, Type-Defect

GoogleCodeExporter commented 9 years ago

Original comment by hguerreiro@gmail.com on 27 Sep 2010 at 8:11

Added labels: Type-Other

GoogleCodeExporter commented 9 years ago

Original comment by hguerreiro@gmail.com on 27 Sep 2010 at 8:11

Added labels: Type-Enhancement
Removed labels: Type-Other

GoogleCodeExporter commented 9 years ago

Original comment by hguerreiro@gmail.com on 27 Sep 2010 at 8:30

Added labels: Priority-Medium

n37r06u3 / webpymail

HeaderParseError on some messages #22