sbg / sevenbridges-python

SevenBridges Python Api bindings
Apache License 2.0
46 stars 27 forks source link

Erratic upload behaviour #57

Closed asyavuz closed 7 years ago

asyavuz commented 7 years ago

I constantly receive "failed to complete: _submit_part" error while trying to upload a small FASTQ file (~81mb).

I reproduced this issue with multiple other files and I decided to test out different file sizes to see whether it's something related to the file size. I tried about 10 runs of attached script and almost in every trial upload failed for any file larger than 2 MB. Sometimes upload of a 2 MB file was also a failure, and one time a 5 MB file was successfully uploaded as well.

api_upload_test.txt

Also, a new exception was thrown while handling the original exception. The full error text was as following:

Exception in thread Thread-5:
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/sevenbridges/transfer/upload.py", line 509, in run
    for _ in parted_file:
  File "/usr/local/lib/python3.5/site-packages/sevenbridges/transfer/upload.py", line 182, in __iter__
    yield future.result()
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/concurrent/futures/_base.py", line 405, in result
    return self.__get_result()
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/concurrent/futures/_base.py", line 357, in __get_result
    raise self._exception
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/concurrent/futures/thread.py", line 55, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.5/site-packages/sevenbridges/transfer/upload.py", line 101, in _upload_part
    session, part_url, part, timeout
  File "/usr/local/lib/python3.5/site-packages/sevenbridges/decorators.py", line 52, in wrapper
    threading.current_thread().getName(), f.__name__)
sevenbridges.errors.SbgError: Thread-4: failed to complete: _submit_part

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.5/site-packages/sevenbridges/transfer/upload.py", line 528, in run
    raise SbgError(six.text_type(e))
sevenbridges.errors.SbgError: Thread-4: failed to complete: _submit_part

I'd appreciate any pointers.

macOS Sierra 10.12.2, Python v3.5.2, sevenbridges-python v0.6.1

SenadI commented 7 years ago

@asyavuz Thank you for reporting this one. This error usually happens with when the internet connection is not of the highest quality. You can upload files that are < 5MB, but 5MB is the smallest chuck size. Therefore when you submit 10mb file for upload you are submitting two parallel chunks of 5mb each. The threadpool is of size 16 and therefore when you are uploading 80MB file you have 16 parallel threads submitting 5MB chunks in parallel, this can really put pressure on your I/O. You can lower the thread count but if you can not push 5MB with your I/O it will still fail.

I've tested it from home and I did manage to reproduce the same issue. From machines that have better I/O towards s3 everything went well.

In [5]: my_project = [p for p in api.projects.query(limit=100).all() if p.name == project_name][0]
   ...:
   ...: mb = 1048576
   ...: test_sizes = [int(0.5*mb), 1*mb, 2*mb, 5*mb, 10*mb, 15*mb, 20*mb, 25*mb, 50*mb, 75*mb, 100*mb]
   ...:
   ...: for size in test_sizes:
   ...:     print("Uploading size %d..." % size)
   ...:     test_filename = "File_%d.dat" % size
   ...:     with open(test_filename, "wb") as out:
   ...:         out.seek(size)
   ...:         out.write(b'0')
   ...:
   ...:     api.files.upload(project=my_project, path=test_filename)
   ...:
Uploading size 524288...
Uploading size 1048576...
Uploading size 2097152...
Uploading size 5242880...
Uploading size 10485760...
Uploading size 15728640...
Uploading size 20971520...
Uploading size 26214400...
Uploading size 52428800...
Uploading size 78643200...
Uploading size 104857600...

My advice here would to use sevenbridges uploader GUI or command line. Considering that you have account with sevenbridges you can find it when you click add files within your project dashboard and then Desktop Uploader or Command Line Uploader/ API.

Those uploaders are made to be used mostly on users personal computers and can handle low I/O quite nicely. If you really need support in the library I can't make any promises but I can revisit the problem in some of the next versions.

Hope this was helpful

asyavuz commented 7 years ago

Thanks for the detailed explanation @SenadI. I was just trying out the python library, to be honest, and felt compelled to report when I observed this issue. It'd have been nice, but I can't say I really need the support.

Also, GUI and command line uploader worked flawlessly for me, thanks for the pointers!

SenadI commented 7 years ago

@asyavuz All bug reports, questions and pull requests are most welcomed. I will leave this one open and if we get few more requests I will revisit the problem.

For pointers - always it's my pleasure and I am glad GUI and CLI worked for you.

SenadI commented 7 years ago

Closing this issue for now.