python / cpython

The Python programming language
https://www.python.org
Other
63.54k stars 30.44k forks source link

poplib maxline behaviour may be wrong #68094

Open 53fa0fcb-8887-48ff-be74-6ce0ba2b2879 opened 9 years ago

53fa0fcb-8887-48ff-be74-6ce0ba2b2879 commented 9 years ago
BPO 23906
Nosy @doko42, @tiran, @bitdancer, @berkerpeksag

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['type-bug', 'library'] title = 'poplib maxline behaviour may be wrong' updated_at = user = 'https://bugs.python.org/gnarvaja' ``` bugs.python.org fields: ```python activity = actor = 'r.david.murray' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'gnarvaja' dependencies = [] files = [] hgrepos = [] issue_num = 23906 keywords = [] message_count = 11.0 messages = ['240425', '245900', '245909', '246726', '246730', '247282', '247284', '247289', '248455', '248459', '248460'] nosy_count = 9.0 nosy_names = ['doko', 'rblank', 'christian.heimes', 'r.david.murray', 'berker.peksag', 'introom', 'gnarvaja', 'Ingo Ruhnke', 'Chris Smowton'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue23906' versions = ['Python 2.7', 'Python 3.4', 'Python 3.5', 'Python 3.6'] ```

53fa0fcb-8887-48ff-be74-6ce0ba2b2879 commented 9 years ago

After bpo-16041 was fixed, Python started to validate that lines coming from the POP server should be under 2048 bytes.

This breaks the mail retrieval from at least dovecot servers, as this mail server does not breaks responses in 512 o 2048 sized lines.

On dovecot's side, they said there is a misunderstood of the RFC on the Python side, that the RFC 1939 "is talking about POP3 responses themselves - not about the actual email message body". You can see here the related mail thread:

http://dovecot.org/pipermail/dovecot/2015-April/100475.html

I'm not sure Who is right, but I think it's a problem (at least it was for me).

34b5bed8-5a21-4553-9b3a-83cd0e64005a commented 9 years ago

This also breaks mail retrieval from both gmx.de and gmail.com (two rather large and popular mail provider). After setting _MAXLINE in/usr/lib/python2.7/poplib.py to some arbitrary higher number mail retrieval from both services worked fine again.

This this 2048 does definitively looks badly broken.

bitdancer commented 9 years ago

The RFC is in fact not clear on this point. It is entirely possible to read it as saying that each line of a mulitline response is limited to 512 octets. I agree, however, that that is not the most reasonable interpretation. Instead, the line length of RETR message lines should be governed by RFC 5322, which specifies a maximum line length of 998 octets.

That, however, means that technically dovecot is still broken, since 2048 is quite a bit larger than 998. In reality, it means that the *internet* is broken, in that I presume the root of the problem is that there are mail originators out there that are not obeying RFC 5322 (and its predecessors...this limit goes back to 821/822).

We use 8192 in smtplib, and that hasn't caused any problems...but then again smtplib is originating email, not receiving it. The IMAP protocol has its own problems, quite aside from the length of message body lines, so we ended up with a very large MAXLINE there. It may be that we have no choice except to do something similar in poplib.

An interesting question in this context is what smtp servers do. since if anyone was going to reject messages with overlong lines, it would be the smtp server's job to do it.

f6e4f9f8-e879-4996-998c-4ec2aaa9c2cf commented 9 years ago

I found the same problem retrieving mail from my ISP's (unknown) POP3 server. I was sent an HTML email as one long 50KB line, which naturally broke everything.

Instead of limiting line length, I suggest you should limit total message body size, since that's what you're actually trying to defend against here. You could also either use the +OK XXX octets line to set a more conservative limit (and fail fast if it announces intent to send more than your limit).

As above the workaround was to insert import poplib; poplib._MAXLINE = 1000000 at the top of the 'getmail' script.

A side-note: one message that is broken this way causes all future messages to fail because poplib does not flush the connection when bailing due to a 'line too long' error. If it isn't prepared to read the rest of the incoming data, it *must* hang up the connection and re-login to fetch the next message.

bitdancer commented 9 years ago

Could you open a separate bug for the recovery problem, please?

Using a maximum message size would not solve this problem, but it would give the library user control of when it failed, so it is a good feature request.

f6e4f9f8-e879-4996-998c-4ec2aaa9c2cf commented 9 years ago

Why wouldn't that fix the problem? The issue is poplib not tolerating server behaviour seen in the wild, and if you limit by message size not line length you shouldn't see this problem?

(Side note, I'm surprised not to have been emailed when you replied, any idea what I'm missing?)

f6e4f9f8-e879-4996-998c-4ec2aaa9c2cf commented 9 years ago

Created bpo-24706 to describe the unflushed connection problem.

bitdancer commented 9 years ago

Sorry, I was unclear. In order to implement maximum message size we have to do a bit more to the logic than just use the max message size as the readline limit. But it does seem like the right approach to me.

bitdancer commented 9 years ago

Note that the max message size solution can be applied to the maintenance releases as a fix for this issue by choosing a suitable large default message size. The 'feature' part is just the part exposing the size limit in the library API...that part is a feature for 3.6.

e6f26cf2-7085-432b-b59c-1713340321ef commented 9 years ago

Instead of setting a MAXSIZE for the email body, rasing up the MAXLINE might be more meaningful.

Consider the case of MAXSIZE, it's essentially the same as MAXLINE. If MAXSIZE is relatively small, some messages won't pass through. If the MAXSIZE is relatively large, then what's the meaning of setting it?

Thus, it might be more practical to increase the value of MAXLINE so that 99% messages can pass through.

bitdancer commented 9 years ago

If maxline is too small, messages won't get through. If maxline is too large *huge messages will get through...and the DDOS danger of exhausting the server's resources will occur. So, we really ought to provide a way to limit the maximum message size *anyway...at which point a separate maxline value doesn't make any sense, since the RFC specifies no maximum line size.

I'm much more comfortable setting a large maximum message size than setting a large enough maximum line size to permit that size of message consisting of mostly a single line. Since we aren't going to back out the DDOS fix, we have to put the limit *somewhere*. At least in 3.6 we can make it easy for the application to set it. (Programs using earlier versions will just have to monkey-patch, unfortunately...which they have to do right now anyway.)

dbmikko commented 2 months ago

Any news about this? This affects for example our ERPNext installation (open source software developed in Python).