pallets / flask

The Python micro framework for building web applications.
https://flask.palletsprojects.com
BSD 3-Clause "New" or "Revised" License
67.82k stars 16.2k forks source link

Calling post method in loop yields unexpected results #1025

Closed SimplicityGuy closed 10 years ago

SimplicityGuy commented 10 years ago

server (very basic!):

from flask import (Flask, request, jsonify)
from werkzeug import secure_filename

app = Flask(__name__)
app.config.from_object(__name__)
app.config['MAX_CONTENT_LENGTH'] = 1024 * 1024 * 1024

@app.route('/files', methods=['POST'])
def post_files():
    r = dict()
    r['directory'] = request.form['directory']
    # yes, there is more code here, but this is the simplest repro!
    return jsonify(r)

if __name__ == '__main__':
    app.run(debug=True, host='0.0.0.0')

client (with debugging output)

from itertools import izip_longest
from requests_toolbelt import MultipartEncoder
import requests
import os
import sys
import argparse

URL = "http://localhost:5000/files"

def grouper(n, iterable, fillvalue=None):
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

def import(source_folder, url):
    for (path, dirs, files) in os.walk(source_folder):
        if '.DS_Store' in files:
            files.remove('.DS_Store')

        if '.git' in dirs:
            dirs.remove('.git')

        # in order to limit the number of files being uploaded at once, keep it to 3 at a time
        for f1, f2, f3 in grouper(3, files):
            fields = dict()
            fields['directory'] = os.path.basename(path)
            if f1 is not None:
                fields['f1'] = (f1, open(os.path.join(path, f1), 'rb'))
            if f2 is not None:
                fields['f2'] = (f2, open(os.path.join(path, f2), 'rb'))
            if f3 is not None:
                fields['f3'] = (f3, open(os.path.join(path, f3), 'rb'))
            print fields
            print
            m = MultipartEncoder(fields=fields)
            r = requests.post(url,
                              data=m,
                              headers={'Content-Type': m.content_type})
            print r.json()
            print r
            print
            print

def main():
    parser = argparse.ArgumentParser(description='importer')
    parser.add_argument('source',
                        help='source folder to import from')
    parser.add_argument('--server',
                        help='server to upload folder to',
                        dest='url',
                        required=False,
                        default=URL)

    args = parser.parse_args()
    import(args.source, args.url)

if __name__ == '__main__':
    status = main()
    sys.exit(status)

When I run it with an input directory that has 9 files, the first time through, I get a RESPONSE 200, second time through, I get a RESPONSE 400: 127.0.0.1 - - [14/Apr/2014 22:02:30] "POST /files HTTP/1.1" 200 - 127.0.0.1 - - [14/Apr/2014 22:02:30] "POST /files HTTP/1.1" 400 -

Any idea why this is happening?

DasIch commented 10 years ago

Can you paste the request that produces the 400?

SimplicityGuy commented 10 years ago

The loop that's there iterating over the files first produces a 400.

Sent from my iPhone

On Apr 25, 2014, at 5:18 PM, Daniel Neuhäuser notifications@github.com wrote:

Can you paste the request that produces the 400?

— Reply to this email directly or view it on GitHub.

koddsson commented 10 years ago

I can't reproduce this on my machine. This is the code I ran. And this is the output I got:

client.py

λ bubblegum test → bin/python client.py --server http://localhost:5000 derp
{'directory': 'derp', 'f1': ('2', <open file 'derp/2', mode 'rb' at 0x7f3f83d4e1e0>), 'f2': ('1', <open file 'derp/1', mode 'rb' at 0x7f3f83d4e270>), 'f3': ('3', <open file 'derp/3', mode 'rb' at 0x7f3f83d4e300>)}

Traceback (most recent call last):
  File "client.py", line 62, in <module>
    status = main()
  File "client.py", line 58, in main
    _import(args.source, args.url)
  File "client.py", line 41, in _import
    print r.json()
  File "/home/koddsson/test/local/lib/python2.7/site-packages/requests/models.py", line 741, in json
    return json.loads(self.text, **kwargs)
  File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python2.7/json/decoder.py", line 384, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

app.py

λ bubblegum test → bin/python app.py
 * Running on http://0.0.0.0:5000/
 * Restarting with reloader
127.0.0.1 - - [29/Apr/2014 20:41:36] "POST / HTTP/1.1" 404 -

The derp directory has three files if that matters.

SimplicityGuy commented 10 years ago

Unfortunately I'm still seeing this. Try providing an input with more files in it. I tried giving it an input of a git repo (for instance with the flask repo):

{'directory': u'flask', 'f1': u'.gitignore', 'f2': u'.gitmodules', 'f3': u'.travis-devel-requirements.txt'}
127.0.0.1 - - [29/Apr/2014 20:57:34] "POST /files HTTP/1.1" 200 -
{'directory': u'flask', 'f1': u'.travis-lowest-requirements.txt', 'f2': u'.travis-release-requirements.txt', 'f3': u'.travis.yml'}
127.0.0.1 - - [29/Apr/2014 20:57:35] "POST /files HTTP/1.1" 200 -
{'directory': u'flask', 'f1': u'AUTHORS', 'f2': u'CHANGES', 'f3': u'LICENSE'}
127.0.0.1 - - [29/Apr/2014 20:57:35] "POST /files HTTP/1.1" 200 -
127.0.0.1 - - [29/Apr/2014 20:57:35] "POST /files HTTP/1.1" 400 -
{'directory': u'flask', 'f1': u'run-tests.py', 'f2': u'setup.cfg', 'f3': u'setup.py'}
127.0.0.1 - - [29/Apr/2014 20:57:35] "POST /files HTTP/1.1" 200 -
{'directory': u'flask', 'f1': u'tox.ini'}
...

What's worse too is if you just let the client run with an input folder of the flask repo, at some point you just get a hang where the server no longer sends data back.

I also have a pcap of this issue if it's needed.

FWIW, I'm trying this on a Mac with Mavericks, using Python 2.7.5 (the default version). I've now tried this on three different Macs and all hit the same problems... 400s returned and the hang.

Here is my modified s.py:

from flask import (Flask, request, jsonify)
from werkzeug import secure_filename

app = Flask(__name__)
app.config.from_object(__name__)
app.config['MAX_CONTENT_LENGTH'] = 1024 * 1024 * 1024

@app.route('/files', methods=['POST'])
def post_files():
    r = dict()
    r['directory'] = request.form['directory']

    for k, v in request.files.items():
        file = request.files[k]
        filename = secure_filename(file.filename)
        r[k] = file.filename

    print r
    # yes, there is more code here, but this is the simplest repro!
    return jsonify(r)

if __name__ == '__main__':
    app.run(debug=True, host='0.0.0.0')

and modified c.py:

from itertools import izip_longest
from requests_toolbelt import MultipartEncoder
import requests
import os
import sys
import argparse

URL = "http://localhost:5000/files"

def grouper(n, iterable, fillvalue=None):
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

def ximport(source_folder, url):
    for (path, dirs, files) in os.walk(source_folder):
        if '.DS_Store' in files:
            files.remove('.DS_Store')

        if '.git' in dirs:
            dirs.remove('.git')

        print
        print
        print files
        print

        # in order to limit the number of files being uploaded at once, keep it to 3 at a time
        for f1, f2, f3 in grouper(3, files):
            print f1, f2, f3
            print
            fields = dict()
            fields['directory'] = os.path.basename(path)
            if f1 is not None:
                fields['f1'] = (f1, open(os.path.join(path, f1), 'rb'))
            if f2 is not None:
                fields['f2'] = (f2, open(os.path.join(path, f2), 'rb'))
            if f3 is not None:
                fields['f3'] = (f3, open(os.path.join(path, f3), 'rb'))
            print fields
            print
            m = MultipartEncoder(fields=fields)
            r = requests.post(url,
                              data=m,
                              headers={'Content-Type': m.content_type})
            print r.text
            print

def main():
    parser = argparse.ArgumentParser(description='importer')
    parser.add_argument('source',
                        help='source folder to import from')
    parser.add_argument('--server',
                        help='server to upload folder to',
                        dest='url',
                        required=False,
                        default=URL)

    args = parser.parse_args()
    ximport(args.source, args.url)

if __name__ == '__main__':
    status = main()
    sys.exit(status)
danielchatfield commented 10 years ago

I'm debugging this, I'm 90% sure the bug is with requests_toolbelt or requests though, I'll let you know what I find out.

danielchatfield commented 10 years ago

The problem is with requests_toolbelt, just writing up a bug report now - feel free to close this issue.

The short of it is:

The multipart encoder has a bug where if it the length is greater than the blocksize used to read it (and this it must be read in two chunks) the length returned by len(m) does not match the sum of the chunks which makes the request invalid and Flask rightly rejects this.

I'm not sure why you aren't using the files argument for requests directly - it does the seem thing but without any bugs.

danielchatfield commented 10 years ago

Scratch that, MultipartEncoder.read() is completely broken - read(8192) (default blocksize) returns less than 8192 bytes even though there is more left.

SimplicityGuy commented 10 years ago

Thanks Daniel for the investigation! Closing this issue since it's not a flask issue. Linked requests-toolbelt issue.

SimplicityGuy commented 10 years ago

I'm not sure why you aren't using the files argument for requests directly - it does the seem thing but without any bugs.

The reason I don't want to use requests.post(url, files=..., ...) is because this reads the file upfront rather than streaming the file like MultipartEncoder does. From http://docs.python-requests.org/en/latest/user/quickstart/#post-a-multipart-encoded-file, see:

In the event you are posting a very large file as a multipart/form-data request, you may want to stream the request. By default, requests does not support this, but there is a separate package which does - requests-toolbelt. You should read the toolbelt’s documentation for more details about how to use it.

My usage will likely result in very large files, not like using the flask git repo.