Closed SimplicityGuy closed 10 years ago
Can you paste the request that produces the 400?
The loop that's there iterating over the files first produces a 400.
Sent from my iPhone
On Apr 25, 2014, at 5:18 PM, Daniel Neuhäuser notifications@github.com wrote:
Can you paste the request that produces the 400?
— Reply to this email directly or view it on GitHub.
I can't reproduce this on my machine. This is the code I ran. And this is the output I got:
client.py
λ bubblegum test → bin/python client.py --server http://localhost:5000 derp
{'directory': 'derp', 'f1': ('2', <open file 'derp/2', mode 'rb' at 0x7f3f83d4e1e0>), 'f2': ('1', <open file 'derp/1', mode 'rb' at 0x7f3f83d4e270>), 'f3': ('3', <open file 'derp/3', mode 'rb' at 0x7f3f83d4e300>)}
Traceback (most recent call last):
File "client.py", line 62, in <module>
status = main()
File "client.py", line 58, in main
_import(args.source, args.url)
File "client.py", line 41, in _import
print r.json()
File "/home/koddsson/test/local/lib/python2.7/site-packages/requests/models.py", line 741, in json
return json.loads(self.text, **kwargs)
File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.7/json/decoder.py", line 384, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
app.py
λ bubblegum test → bin/python app.py
* Running on http://0.0.0.0:5000/
* Restarting with reloader
127.0.0.1 - - [29/Apr/2014 20:41:36] "POST / HTTP/1.1" 404 -
The derp
directory has three files if that matters.
Unfortunately I'm still seeing this. Try providing an input with more files in it. I tried giving it an input of a git repo (for instance with the flask repo):
{'directory': u'flask', 'f1': u'.gitignore', 'f2': u'.gitmodules', 'f3': u'.travis-devel-requirements.txt'}
127.0.0.1 - - [29/Apr/2014 20:57:34] "POST /files HTTP/1.1" 200 -
{'directory': u'flask', 'f1': u'.travis-lowest-requirements.txt', 'f2': u'.travis-release-requirements.txt', 'f3': u'.travis.yml'}
127.0.0.1 - - [29/Apr/2014 20:57:35] "POST /files HTTP/1.1" 200 -
{'directory': u'flask', 'f1': u'AUTHORS', 'f2': u'CHANGES', 'f3': u'LICENSE'}
127.0.0.1 - - [29/Apr/2014 20:57:35] "POST /files HTTP/1.1" 200 -
127.0.0.1 - - [29/Apr/2014 20:57:35] "POST /files HTTP/1.1" 400 -
{'directory': u'flask', 'f1': u'run-tests.py', 'f2': u'setup.cfg', 'f3': u'setup.py'}
127.0.0.1 - - [29/Apr/2014 20:57:35] "POST /files HTTP/1.1" 200 -
{'directory': u'flask', 'f1': u'tox.ini'}
...
What's worse too is if you just let the client run with an input folder of the flask repo, at some point you just get a hang where the server no longer sends data back.
I also have a pcap of this issue if it's needed.
FWIW, I'm trying this on a Mac with Mavericks, using Python 2.7.5 (the default version). I've now tried this on three different Macs and all hit the same problems... 400s returned and the hang.
Here is my modified s.py:
from flask import (Flask, request, jsonify)
from werkzeug import secure_filename
app = Flask(__name__)
app.config.from_object(__name__)
app.config['MAX_CONTENT_LENGTH'] = 1024 * 1024 * 1024
@app.route('/files', methods=['POST'])
def post_files():
r = dict()
r['directory'] = request.form['directory']
for k, v in request.files.items():
file = request.files[k]
filename = secure_filename(file.filename)
r[k] = file.filename
print r
# yes, there is more code here, but this is the simplest repro!
return jsonify(r)
if __name__ == '__main__':
app.run(debug=True, host='0.0.0.0')
and modified c.py:
from itertools import izip_longest
from requests_toolbelt import MultipartEncoder
import requests
import os
import sys
import argparse
URL = "http://localhost:5000/files"
def grouper(n, iterable, fillvalue=None):
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
def ximport(source_folder, url):
for (path, dirs, files) in os.walk(source_folder):
if '.DS_Store' in files:
files.remove('.DS_Store')
if '.git' in dirs:
dirs.remove('.git')
print
print
print files
print
# in order to limit the number of files being uploaded at once, keep it to 3 at a time
for f1, f2, f3 in grouper(3, files):
print f1, f2, f3
print
fields = dict()
fields['directory'] = os.path.basename(path)
if f1 is not None:
fields['f1'] = (f1, open(os.path.join(path, f1), 'rb'))
if f2 is not None:
fields['f2'] = (f2, open(os.path.join(path, f2), 'rb'))
if f3 is not None:
fields['f3'] = (f3, open(os.path.join(path, f3), 'rb'))
print fields
print
m = MultipartEncoder(fields=fields)
r = requests.post(url,
data=m,
headers={'Content-Type': m.content_type})
print r.text
print
def main():
parser = argparse.ArgumentParser(description='importer')
parser.add_argument('source',
help='source folder to import from')
parser.add_argument('--server',
help='server to upload folder to',
dest='url',
required=False,
default=URL)
args = parser.parse_args()
ximport(args.source, args.url)
if __name__ == '__main__':
status = main()
sys.exit(status)
I'm debugging this, I'm 90% sure the bug is with requests_toolbelt
or requests
though, I'll let you know what I find out.
The problem is with requests_toolbelt
, just writing up a bug report now - feel free to close this issue.
The short of it is:
The multipart encoder has a bug where if it the length is greater than the blocksize used to read it (and this it must be read in two chunks) the length returned by len(m) does not match the sum of the chunks which makes the request invalid and Flask rightly rejects this.
I'm not sure why you aren't using the files
argument for requests directly - it does the seem thing but without any bugs.
Scratch that, MultipartEncoder.read()
is completely broken - read(8192)
(default blocksize) returns less than 8192 bytes even though there is more left.
Thanks Daniel for the investigation! Closing this issue since it's not a flask issue. Linked requests-toolbelt issue.
I'm not sure why you aren't using the files argument for requests directly - it does the seem thing but without any bugs.
The reason I don't want to use requests.post(url, files=..., ...) is because this reads the file upfront rather than streaming the file like MultipartEncoder does. From http://docs.python-requests.org/en/latest/user/quickstart/#post-a-multipart-encoded-file, see:
In the event you are posting a very large file as a multipart/form-data request, you may want to stream the request. By default, requests does not support this, but there is a separate package which does - requests-toolbelt. You should read the toolbelt’s documentation for more details about how to use it.
My usage will likely result in very large files, not like using the flask git repo.
server (very basic!):
client (with debugging output)
When I run it with an input directory that has 9 files, the first time through, I get a RESPONSE 200, second time through, I get a RESPONSE 400: 127.0.0.1 - - [14/Apr/2014 22:02:30] "POST /files HTTP/1.1" 200 - 127.0.0.1 - - [14/Apr/2014 22:02:30] "POST /files HTTP/1.1" 400 -
Any idea why this is happening?