tornadoweb / tornado

Tornado is a Python web framework and asynchronous networking library, originally developed at FriendFeed.
http://www.tornadoweb.org/
Apache License 2.0
21.76k stars 5.51k forks source link

form data with &s in it breaks get_body_argument() #3418

Closed jonathon-love closed 2 months ago

jonathon-love commented 2 months ago

tornado 6.4.1

hi,

if a tornado webhandler receives the following form

-----------------------------231452068721326341443324831620
Content-Disposition: form-data; name="options"

{"path":"https://drive.google.com/uc?id=1x2IdSNcHGLmot9i1h90gwMJr5lULC2QV&export=download","authToken":null,"accessKey":null}
-----------------------------231452068721326341443324831620--

and it calls:

self.get_body_argument('options')

it returns:

{"path":"https://drive.google.com/uc?id=1x2IdSNcHGLmot9i1h90gwMJr5lULC2QV

and there's another body argument called export (presumably containing =download","authToken":null,"accessKey":null})

this doesn't feel like correct behaviour, but correct me if i'm wrong.

with thanks

bdarnell commented 2 months ago

You are right that this would be incorrect behavior, but it's not what I'm seeing. Here's your example translated into a unit test, which passes for me.

import unittest
from tornado.web import Application, RequestHandler
from tornado.httputil import HTTPServerRequest
from tornado.testing import AsyncHTTPTestCase

class BuggyHandler(RequestHandler):
    def post(self):
        options = self.get_body_argument("options")
        self.write(options)

class MultipartFormDataBugTest(AsyncHTTPTestCase):
    def get_app(self):
        return Application([("/", BuggyHandler)])

    def test_ampersand_in_form_data(self):
        body = (
            b"-----------------------------231452068721326341443324831620\r\n"
            b'Content-Disposition: form-data; name="options"\r\n\r\n'
            b'{"path":"https://drive.google.com/uc?id=1x2IdSNcHGLmot9i1h90gwMJr5lULC2QV&export=download","authToken":null,"accessKey":null}\r\n'
            b"-----------------------------231452068721326341443324831620--\r\n"
        )

        headers = {
            "Content-Type": "multipart/form-data; boundary=---------------------------231452068721326341443324831620"
        }

        response = self.fetch("/", method="POST", body=body, headers=headers)

        expected_result = '{"path":"https://drive.google.com/uc?id=1x2IdSNcHGLmot9i1h90gwMJr5lULC2QV&export=download","authToken":null,"accessKey":null}'
        self.assertEqual(response.body.decode(), expected_result)

if __name__ == "__main__":
    unittest.main()

What else could be different in your environment that explains the different behavior? You didn't mention your Content-Type line; is it set correctly? This looks almost like what you'd get if you had Content-Type application/x-www-form-urlencoded (specifically the splitting on ampersands and treating export as an argument name), but it's not exactly the same (urlencoded format would not use "options" as a name).

jonathon-love commented 2 months ago

thanks for your time here ben. i'll take another look at this, and come back to you soon.

with thanks

jonathon-love commented 2 months ago

it turns out i have a reverse proxy that is mangling things. my bad.

thanks again for your help.