mk-fg / python-onedrive

Obsolete python/cli module for MS SkyDrive/OneDrive's old API, do not use for new projects
Do What The F*ck You Want To Public License
199 stars 32 forks source link

BITS upload doesn't work for large files #39

Closed pk910 closed 9 years ago

pk910 commented 9 years ago

Hello,

I've noticed, that uploading large files using the BITS protocol doesn't work correctly:

DEBUG:onedrive.api_v5:Uploading BITS fragment 105 / 107 (max-size: 48.83 MiB)
INFO:requests.packages.urllib3.connectionpool:Resetting dropped connection: cid-17***censored***fd.users.storage.live.com
WARNING:requests.packages.urllib3.connectionpool:Connection pool is full, discarding connection: cid-17***censored***fd.users.storage.live.com
DEBUG:onedrive.api_v5:Uploading BITS fragment 106 / 107 (max-size: 48.83 MiB)
INFO:requests.packages.urllib3.connectionpool:Resetting dropped connection: cid-17***censored***fd.users.storage.live.com
WARNING:requests.packages.urllib3.connectionpool:Connection pool is full, discarding connection: cid-17***censored***fd.users.storage.live.com
DEBUG:onedrive.api_v5:Uploading BITS fragment 107 / 107 (max-size: 48.83 MiB)
INFO:requests.packages.urllib3.connectionpool:Resetting dropped connection: cid-17***censored***fd.users.storage.live.com
WARNING:requests.packages.urllib3.connectionpool:Connection pool is full, discarding connection: cid-17***censored***fd.users.storage.live.com
INFO:requests.packages.urllib3.connectionpool:Resetting dropped connection: cid-17***censored***fd.users.storage.live.com
DEBUG:requests.packages.urllib3.connectionpool:"POST /users/0x17***censored***fd/LiveFolders/a_very_big_file.tar.gz.enc HTTP/1.1" 401 0
DEBUG:onedrive.api_v5:Refreshing access_token
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): login.live.com
DEBUG:requests.packages.urllib3.connectionpool:"POST /oauth20_token.srf HTTP/1.1" 200 1798
DEBUG:onedrive.conf:Updating configuration file ('/root/.lcrc')
DEBUG:requests.packages.urllib3.connectionpool:"POST /users/0x17***censored***fd/LiveFolders/a_very_big_file.tar.gz.enc HTTP/1.1" 500 0
Traceback (most recent call last):
  File "/usr/local/bin/onedrive-cli", line 9, in <module>
    load_entry_point('python-onedrive==14.11.22', 'console_scripts', 'onedrive-cli')()
  File "/usr/local/lib/python2.7/dist-packages/python_onedrive-14.11.22-py2.7.egg/onedrive/cli_tool.py", line 329, in main
    else: dst, xres = None, api.info(api.put_bits(optz.file, folder_path=dst))
  File "/usr/local/lib/python2.7/dist-packages/python_onedrive-14.11.22-py2.7.egg/onedrive/api_v5.py", line 490, in put_bits
    'BITS-Session-Id': bits_sid })
  File "/usr/local/lib/python2.7/dist-packages/python_onedrive-14.11.22-py2.7.egg/onedrive/api_v5.py", line 325, in __call__
    return self.request(api_url(), **request_kwz)
  File "/usr/local/lib/python2.7/dist-packages/python_onedrive-14.11.22-py2.7.egg/onedrive/api_v5.py", line 149, in request
    raise raise_for.get(code, ProtocolError)(code, message)
onedrive.api_v5.ProtocolError: (500, '500 Server Error: Internal Server Error')
end

I think it's because of an autentication timeout?

Greets pk910

mk-fg commented 9 years ago

Don't think auth timeout is likely culprit there - it gives http-401 error that is handled properly on every request (i.e. access token gets refreshed and request repeated, as with first POST request above), http-500 looks more like some genuine server issue to me (and in fact that's what "500 Internal Server Error" means). Only if whole BITS session is expected to finish before single access token times-out, but that's probably a bug (negating most benefits of such API), and not much can be done about that on client.

Also, most OneDrive errors (though not necessarily for BITS api) return json blob with error info, which doesn't seem to be the case here (it should've been dumped in the message, and response size is 0), again pointing in the "something's wrong on the other side" direction.

Guess you can try raising issue on OneDrive support forum/system/something, especially if you have premium (non-free) service, though I'm not sure if BITS api is officially supported yet (I've implemented it based on github gist link from #34).

I think someone mentioned this api randomly giving transient errors in #34, maybe it's the same thing here, if other attempts (for the same file) succeed... or does it only fail for you on slow uploads of such size?

On an unrelated note, wonder what's up with all the warnings from requests module in the paste above, probably me misusing it's session somehow, should probably look into that sometime.

Thanks for the info.

pk910 commented 9 years ago

Thanks for the fast reply :)

Could it be that the session just times out because no other requests are made? maybe it would be just fine to do some dummy requests to keep the session alive? :D

mk-fg commented 9 years ago

If you mean auth session (i.e. access token timeout), it's just fixed timeout since it has been requested - iirc 1 hour. BITS session is supposed to be valid for 24 hours, I think.

Hence it really doesn't make much sense if BITS upload session is tied to a single access token, but should be kinda easy to test (if only to report it more precisely to OneDrive support) by adding code to refresh the token in the middle of the session (and seeing if it'd reliably fail in the same way, while succeeding without that change).

Also note that my last comment about "requests session" is completely unrelated to two "session" concepts above, that one is just a pool of http/https connections that module keeps (it's called Session there), and any failures on that level (not that there are any) should not ever produce http errors.

mk-fg commented 9 years ago

As discussed on irc immediately after last msg, it does indeed look like when OAuth2 access_token gets invalidated during file upload, last request with "BITS-Packet-Type: Close-Session" might fail with http-500 error. If upload (of an even larger file) gets started with fresh access_token and finishes before it gets stale, no such error gets signaled.

It does look like an issue with BITS API, which should probably be expected from such an experimental thing. I've linked this report to "Known Issues" section of the README, to maybe discourage others from relying too much on that API working as expected, at least for now.

Also, as one possible workaround, I've added "auth_refresh" command to cli, which takes no arguments or options, and all it does is forcefully gets new (guaranteed fresh!) access_token, which I think can be done right before starting each large-file upload via BITS in hopes that upload might finish before this token goes stale and http-500 error gets signaled.

And, of course, this error might already not be there or have a totally different cause, above is just speculation based on a few experiments @pk910 did with uploads a few days ago (thanks again btw, I was too lazy to ever test BITS code with an actual large files myself).

ddurdle commented 9 years ago

It would be nice if this program handled large files better. It seems the failure happens when running batch uploads. In a script of 10 uploads, depending on the sizing of the files, I expect at least 50% of the uploads to fail with error 500 at some point. I essentially have to rerun this batch script over-and-over. Is there something that can be ran between uploads to refreshen the auth?

I also get a lot of this: WARNING:requests.packages.urllib3.connectionpool:Connection pool is full, discarding connection:

seems to upload anyways, unless it later runs into an error 500. I tried modifying it to pool and max size 1, and it still produces this warning.

This program is basically unusable for me as I need to sync a lot of 100MB-9GB files :(

Seems to take me forever to upload a directory of files to OneDrive. A directory with 25GB has taken me a full day, and it's still going, due to these random error 500.

On another note, I use a lot of Google Drive services, and their chunking never runs into this issue. I run uploads 24/7 and only run into an error 500-like issue with them about once a week.

I uploaded that same 25GB to Google Drive in approx 30 min.

ddurdle commented 9 years ago

I'm going to try to use the auto_refresh. You should highlight this directly in the README, or perhaps putting it as standard call during a put process.

I have a friend uploading 6TB of raw photos using a synology NAS. I wonder what they are doing any different, as he never reports on any issues or failed uploads.

ddurdle commented 9 years ago

It took 25 mins to transfer 3.28GB to OneDrive. This is with an auth_refresh prior. What is the observed timeout with OneDrive?

mk-fg commented 9 years ago

Hey, thanks for all the feedback.

Yeah, guess I can add a note about auth_refresh to the README, though it seem to be a weird hack, dunno why that even works.

I suspect that putting it in the BITS upload sequence might only make things worse though, as you might try some uploads, see them (randomly, more likely) working due to that, then spend a few hours scripting backup process, and only then have it blowing up occasionally for months or years.

I'd think that maybe dropping the unreliable thing entirely might be a better idea, but given how many people seem to be interested in it, probably not so much.

Guess middle ground can be adding some --do-auth-hack flag for the cli tool to make it still blow up, but have a code to do auth and retries (there was another good ticket about it) and all that, though that just needs to be written sometime.


I also get a lot of this: WARNING:requests.packages.urllib3.connectionpool:Connection pool is full, discarding connection:

Yeah, got these here as well, guess it's not just weird requests version on my machine, maybe I'm using its connection pool machinery wrong, will need to look into that.


I have a friend uploading 6TB of raw photos using a synology NAS. I wonder what they are doing any different, as he never reports on any issues or failed uploads.

I bet windows client also works fine, though given how relatively new this api for large files seem to be, might be something proprietary that existed for a while... hell, there might be some other github gist with the spec floating around, you never know ;)

ddurdle commented 9 years ago

I love the work done on this application. It saved me from having to write something. There was no equivalent to this for Google Drive on *inx platforms, so I had to write something up in PERL. When I started using OneDrive, I was dreading the thought of having to do the same for this cloud service.

Microsoft has a far way to still go to rival Google Drive. Guess the unlimited space (or 10TB) associated with a Office 365 subscription might make some users (like myself) have a look.

I like to report that having run a script on two servers uploading to a OneDrive account, with a "auth_refresh" in between files has made a drastic difference. I'm only now encounter the odd 500/501/503 error. I performed in a few hours the equivalent of what took me an entire day prior.

Is the code establishing multiple connections when it uploads the chunks? I find the upload very slow, but it's probably 99% related on OneDrive.

I'm uploading from two separate servers connected to different tier 2 network connections, within North America. When I run my PERL scripts to sync with Google Drive using a single connection to send the chunks through, the uplink speed averages a steady 7MB/s. When I upload to OneDrive, it seems to constantly bounce around 0-4MB/s.

I haven't done any tests from my home, but they'd probably not reveal much as I'm capped at wired upload connection of 1Mbps and a HSPA+ upload connection of 5Mbps.

mk-fg commented 9 years ago

There was no equivalent to this for Google Drive on *inx platforms, so I had to write something up in PERL.

Hmmph, fairly sure that when I looked at python modules for Google Drive (to use in mk-fg/tahoe-lafs-public-clouds), there was a totally legit, properly supported and working official one, though I guess you might mean the lack of a CLI tool, which that module might indeed not include.

On cue from #34, I did check the gist where this API is specified, and a person there seem to report that doing auth token refresh right before "session commit" request fixes everything.

So there's now a --bits-do-auth-refresh-before-commit-hack option for the "put" command that does this trick. Maybe try it out and see if it actually works as advertised there? Would be great if it'd be 100%-success solution, at least until something's fixed in OneDrive itself.


As for multiple connections - it's definitely not implemented now, and probably needs something like threads, some temporary event-loop within requests module (if there are hooks for such thing) or just a more complicated "resumable after restart" uploads implementation, where some "session state" thing is tracked.

Definitely possible, with just pushing chunk uploads into some thread pool being probably the simpliest (but kinda half-assed imo, as it won't allow for app restarts) solution, just not implemented in any way.

As to whether it'd help to boost the upload speeds - don't know, but you can probably start several uploads in parallel and see if these would be proceeding at the same 0-4 MB/s speed each or throttled to about the same speed as a single non-concurrent one.

Also, if you have contract of sorts with Microsoft for the serivce, guess you can just ask them about all such things. I just have a free account there (and don't even use that one much), so not really my place to complain or ask about quality of service.

ddurdle commented 9 years ago

I'll try the option and let you know. On a related topic, I notice that about 1 in 10-15 uploads fail randomly due to a 503 error (probably network hiccup or other issue). It doesn't seem like the client "retries" despite any max_retry value that may be set -- it immediately exits reporting error 503. Is there a way to instruct the client to retry? Network hiccups and bad responses are a normal occurrence, if the packet is resent, chances are that it'll process fine.

mk-fg commented 9 years ago

Ah, you probably mean "max_retries" that's being passed to requests module. It's not the first time people try to use it for such thing, but it only applies to underlying connection failures, e.g. DNS error, TCP socket connection timeout or such:

http://docs.python-requests.org/en/latest/api/#requests.adapters.HTTPAdapter

Any HTTP response is a perfectly valid result acquired from server and this option has no effect on it, of course. I should probably add some kind of comment about that too.

And there are no other kind of retries in the module, even though there are certainly cases where having this on the module level here might help, if only to avoid having to re-implement that for all users. #35 is kinda-related to that (basically "retry on http 420 error").

mk-fg commented 9 years ago

I should probably add some kind of comment about that too.

Removed that "max_retries" parameter from code comment to avoid potential misuse, added a warning against putting random parameters for requests adapter without reading docs on them.

Also added "no idea how to fix these" note about warnings from requests connection pool, with rather obvious suggestions on how to go about resolving or hiding that.

It doesn't address "5XX errors from OneDrive servers" issue, but rather other minor things mentioned here. Errors like 503 can be "network hiccups", but if so, probably on a larger scale - as in "DDoS on MS servers", "internal MS cloud network split/overload" or maybe some rate-limiting. And as mentioned in the previous comment, I think it'd make sense to handle these in the module (optionally or not), just need the code to do it.