requests / toolbelt

A toolbelt of useful classes and functions to be used with python-requests
https://toolbelt.readthedocs.org
Other
989 stars 186 forks source link

requests_toolbelt.multipart.decoder.ImproperBodyPartContentException: content does not contain CR-LF-CR-LF #352

Open enigmathix opened 1 year ago

enigmathix commented 1 year ago

This error appears even though there is no requirement in RFC 2046 to have the body end with 2 CR-LF. From https://www.rfc-editor.org/rfc/rfc2046.html#section-5.1.1:

Overall, the body of a "multipart" entity may be specified as
   follows:

     dash-boundary := "--" boundary
                      ; boundary taken from the value of
                      ; boundary parameter of the
                      ; Content-Type field.

     multipart-body := [preamble CRLF]
                       dash-boundary transport-padding CRLF
                       body-part *encapsulation
                       close-delimiter transport-padding
                       [CRLF epilogue]

     transport-padding := *LWSP-char
                          ; Composers MUST NOT generate
                          ; non-zero length transport
                          ; padding, but receivers MUST
                          ; be able to handle padding
                          ; added by message transports.

     encapsulation := delimiter transport-padding
                      CRLF body-part

     delimiter := CRLF dash-boundary

     close-delimiter := delimiter "--"

     preamble := discard-text

     epilogue := discard-text

     discard-text := *(*text CRLF) *text
                     ; May be ignored or discarded.

     body-part := MIME-part-headers [CRLF *OCTET]
                  ; Lines in a body-part must not start
                  ; with the specified dash-boundary and
                  ; the delimiter must not appear anywhere
                  ; in the body part.  Note that the
                  ; semantics of a body-part differ from
                  ; the semantics of a message, as
                  ; described in the text.

     OCTET := <any 0-255 octet value>

As per this spec, the simplest multipart would look like this:

--boundary CRLF
MIME-part-headers
[CRLF MIME-part-headers]
[*]
CRLF --boundary--

There is one CRLF required at the end of the body, not two. In fact, the Google App Engine posts data internally that contains only 1 CRLF when a form field is left empty (the example below is using the data it generates).

Step to reproduce:

from requests_toolbelt.multipart import decoder
data = b'--foo\r\nContent-Type: text/plain; charset="UTF-8"\r\nContent-Disposition: form-data; name=empty\r\n\r\n--foo\r\nContent-Type: text/plain; charset="UTF-8"\r\nContent-Disposition: form-data; name=text\r\n\r\nSome Text\r\n--foo--'

decoder.MultipartDecoder(data, 'multipart/form-data; boundary="foo"')

output:

Traceback (most recent call last):
  File "/Users/christophe/toolbelt.py", line 4, in <module>
    decoder.MultipartDecoder(data, 'multipart/form-data; boundary="foo"')
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/requests_toolbelt/multipart/decoder.py", line 111, in __init__
    self._parse_body(content)
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/requests_toolbelt/multipart/decoder.py", line 150, in _parse_body
    self.parts = tuple(body_part(x) for x in parts if test_part(x))
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/requests_toolbelt/multipart/decoder.py", line 150, in <genexpr>
    self.parts = tuple(body_part(x) for x in parts if test_part(x))
                       ^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/requests_toolbelt/multipart/decoder.py", line 141, in body_part
    return BodyPart(fixed, self.encoding)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/requests_toolbelt/multipart/decoder.py", line 63, in __init__
    raise ImproperBodyPartContentException(
requests_toolbelt.multipart.decoder.ImproperBodyPartContentException: content does not contain CR-LF-CR-LF

For comparison, here is the same data processed with cgi:

from io import BytesIO
import cgi

data = b'--foo\r\nContent-Type: text/plain; charset="UTF-8"\r\nContent-Disposition: form-data; name=empty\r\n\r\n--foo\r\nContent-Type: text/plain; charset="UTF-8"\r\nContent-Disposition: form-data; name=text\r\n\r\nSome Text\r\n--foo--'
environ = {'CONTENT_LENGTH': str(len(data)),
        'CONTENT_TYPE': 'multipart/form-data; boundary="foo"',
        'REQUEST_METHOD': 'POST',
        'boundary': b'foo'}

stream = BytesIO(data)
print(cgi.parse_multipart(stream, environ))

Output:

{'empty': [''], 'text': ['Some Text']}