Closed bobobo1618 closed 9 years ago
To clarify:
BITS is a simple extension to HTTP that enables chunked file uploads to OneDrive.
Actually means not Transfer-Encoding: chunked
, which is already supported and works with OneDrive, but rather uploading large files over several TCP/HTTP connections with each request only sending some byte-range from the original file.
Wish they didn't call it "chunked" in HTTP context like that, as it already means a different thing there.
Does that mean we can upload files >100MB finally?
The normal API doesnt accept bigger files after a few seconds at 8MB/s. :(
onedrive.api_v5.ProtocolError: (None, "('Connection aborted.', error(104, 'Connection reset by peer'))")
That seem to be the idea of BITS, yes.
It would be so awesome if it would be implemented in python-onedrive. :-)
Yes, the new API appears to work to upload files up to the 10GB file size limit. I've been experimenting with it. So far, the largest file that I've uploaded through that API has been 2GB. I'm going to continue to experiment with it. My main issue has been getting errors that require me to start uploading again from the beginning. I assume it wouldn't be too difficult for someone to add this feature to python-onedrive.
My main issue has been getting errors that require me to start uploading again from the beginning.
The whole file, you mean, not just one of the chunks?
Because if not, I imagine you can easily split multi-GB file into 50 KiB chunks with no significant overhead and re-uploading these shouldn't be a problem. Though I didn't read into the doc to figure out if there're limits on chunk size/count.
Does that mean we can upload files >100MB finally?
I assume it wouldn't be too difficult for someone to add this feature to python-onedrive.
Yeah, simple implementation can probably be one method with parameters like "chunk_size", "retries" and "timeout" that'd read/upload these chunks from a source file sequentially.
There can also be an implementation that'd store upload state in a persistent config file, allowing for e.g. upload resuming after app restart, plus exposing that "half-uploaded" state in the python api somehow.
It sometimes thinks that a fragment was uploaded out of order or that a there has been some overlap with a previously uploaded fragment even though previous fragments were uploaded successfully. Some fragment errors require that the entire upload be restarted. I think I also get some other weird errors sometimes. I'm not sure if my account is subject to upload limits that are causing these errors. I'm not too experienced in writing this sort of program, though I'll probably tinker with my experimental upload code (unrelated to python-onedrive) some more over the next few days.
I'm unsure what the optimal chunk size should be. The stated max is 60mb, and I've successfully tried from anywhere from a few kb to 30mb or so. I suppose it may make sense to dynamically adjust based on network performance so that each chunk takes approximately the same amount of time to send.
I suppose it may make sense to dynamically adjust based on network performance so that each chunk takes approximately the same amount of time to send.
I imagine one can just grab some TCP window scaling algorithm verbatim and apply here, treating any failure as a "lost packet" ;)
It'd be great to see support for this implemented in the python library. @mk-fg I've updated the documentation to stop using the term "chunked" or "chunk" to avoid confusion. Thanks for the feedback.
Added initial (simple) support for the thing in 7943435, but didn't get it to work so far:
For folder-id upload urls, API seem to flat-out give http-404:
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): cid-a3a6XXX.users.storage.live.com
DEBUG:requests.packages.urllib3.connectionpool:"POST /users/0xa3a6XXX/items/folder.a3a689XXX!112/README.md HTTP/1.1" 404 0
For "Transfer-Encoding: chunked" uploads to folder-path urls, getting http-400.
Most likely due to missing Content-Length, as it's explicitly mentioned in the gist. Will probably fix it eventually, as soon as I'll figure out why all fixed-length stream-body uploads seem to hang with OneDrive APIs (related: #30), probably something in "OneDriveHTTPClient.request" wrapper func...
Unlike other OneDrive APIs, error doesn't seem to have a body with json-encoded clarification of what exactly went wrong.
Here's what I did to get it to sort of work:
For Create Session packet:
headers = {'X-Http-Method-Override' : 'BITS_POST',
'Authorization': 'Bearer '+od_access_token,
'BITS-Packet-Type': 'Create-Session',
'BITS-Supported-Protocols': '{7df0354d-249b-430f-820d-3d2a9bef4931}'}
r = requests.post('https://cid-XXXXXXXXXXX.users.storage.live.com/users/0xXXXXXXXXXXXX/LiveFolders/Test_Upload/' +filename,
headers=headers) #hardcoded CID and test upload folder
To get session ID from Create Session response:
session_id = r.headers['bits-session-id']
For fragment:
headers = {'X-Http-Method-Override' : 'BITS_POST',
'Authorization': 'Bearer '+od_access_token,
'BITS-Packet-Type': 'Fragment',
'BITS-Session-Id': session_id,
'Content-Length': chunkSize,
'Content-Range' : 'bytes '+str(chunkSize*x)+'-'+str(chunkSize*(x+1)-1)+'/'+str(totalSize),
}
r = requests.post('https://cid-XXXXXXXXXXX.users.storage.live.com/users/0xXXXXXXXXXXXX/LiveFolders/Test_Upload/'+filename,
headers=headers, data=data)
For close session:
headers = {'X-Http-Method-Override' : 'BITS_POST',
'Authorization': 'Bearer '+od_access_token,
'BITS-Packet-Type': 'Close-Session',
'BITS-Session-Id': session_id,
'Content-Length': '0'}
r = requests.post('https://cid-XXXXXXXXXXX.users.storage.live.com/users/0xXXXXXXXXXXXX/LiveFolders/Test_Upload/'+filename,
headers=headers)
I occasionally receive some 416 (FragmentOutOfOrder or FragmentOverlap) or 503 (ServiceNotAvailable) errors, and I am unable to resume the file. I haven't had a chance to look into those any closer.
Thanks.
I've been able to spot at least off-by-one error in Content-Range of my implementation (and source gist, it seems). Also, I think you don't need to pass Content-Length headers explicitly like that, as requests will calculate and add them automatically from the passed data.
I wonder, have you tried using folder-id URLs (as the rest of the API does) instead of LiveFolders?
Mentioned http-400 error for folder-path uploads was due to that off-by-one error (which seem to also be present in the documentation example), thanks to @Lyrrad for helping me spot that.
Uploads via API seem to be working now, in general:
% ls -lah image.jpg
-rw-r--r-- 1 fraggod fraggod 5.3M Nov 23 06:08 image.jpg
% ./onedrive-cli --debug put -b --bits-frag-bytes 512000 image.jpg Pics
DEBUG:onedrive.api_v5:Using "requests" module version: '2.3.0'
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): apis.live.net
DEBUG:requests.packages.urllib3.connectionpool:"GET /v5.0/me?access_token=EwCAAq1D... HTTP/1.1" 200 100
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): cid-a3aXXX.users.storage.live.com
DEBUG:requests.packages.urllib3.connectionpool:"POST /users/0xa3aXXX/LiveFolders/Pics/image.jpg HTTP/1.1" 201 0
DEBUG:onedrive.api_v5:Uploading BITS fragment 1 / 11 (max-size: 0.49 MiB)
DEBUG:onedrive.api_v5:Uploading BITS fragment 2 / 11 (max-size: 0.49 MiB)
DEBUG:onedrive.api_v5:Uploading BITS fragment 3 / 11 (max-size: 0.49 MiB)
DEBUG:onedrive.api_v5:Uploading BITS fragment 4 / 11 (max-size: 0.49 MiB)
DEBUG:onedrive.api_v5:Uploading BITS fragment 5 / 11 (max-size: 0.49 MiB)
DEBUG:onedrive.api_v5:Uploading BITS fragment 6 / 11 (max-size: 0.49 MiB)
DEBUG:onedrive.api_v5:Uploading BITS fragment 7 / 11 (max-size: 0.49 MiB)
DEBUG:onedrive.api_v5:Uploading BITS fragment 8 / 11 (max-size: 0.49 MiB)
DEBUG:onedrive.api_v5:Uploading BITS fragment 9 / 11 (max-size: 0.49 MiB)
DEBUG:onedrive.api_v5:Uploading BITS fragment 10 / 11 (max-size: 0.49 MiB)
WARNING:requests.packages.urllib3.connectionpool:Connection pool is full, discarding connection: cid-a3aXXX.users.storage.live.com
DEBUG:onedrive.api_v5:Uploading BITS fragment 11 / 11 (max-size: 0.49 MiB)
WARNING:requests.packages.urllib3.connectionpool:Connection pool is full, discarding connection: cid-a3aXXX.users.storage.live.com
DEBUG:requests.packages.urllib3.connectionpool:"POST /users/0xa3aXXX/LiveFolders/Pics/image.jpg HTTP/1.1" 200 0
DEBUG:root:Call result:
--------------------
...(lots of metadata)...
--------------------
(will be silent without --debug)
API-wise, "put" method now has "bits_api_fallback" option:
put(path_or_tuple, folder_id='me/skydrive', overwrite=None, downsize=None, bits_api_fallback=True)
...which can be True/False or max (non-BITS) file size (default - 95 MiB). And there's also "put_bits" method.
Limitations/hacks:
overwrite/downsize flags, supported by the regular POST/PUT API requests are not documented for BITS, so "put_bits" does not have these.
Files uploaded via BITS seem to overwrite same-name ones, so "put" will raise exception when falling-back to BITS with overwrite set to False. Passed "downsize" option will issue a warning on such fallback. Tried passing "overwrite=false" in query of a BITS session creation request, didn't work.
Couldn't get BITS uploads to folder-ids to work, suspecting that these might not be implemented yet.
Simple workaround in place "resolves" folder_id (if passed instead of path) to folder_path via several (== depth) "info" calls.
file.{user_id}.{file_id}
), so it gets converted, unless "raw_id=True" gets passed to "put_bits".Given that it's a "simple" implementation, there's no way to resume BITS uploads after e.g. app restart atm.
Kinda easy to add that by chopping "put_bits" into smaller pieces and adding some "bits upload session" (class or generator) concept. Will maybe implement later.
Thanks again to everyone for the feedback.
Note on the output in previous msg: no idea about these warnings from "requests" - it doesn't even open new connections there (reusing same one for all BITS requests).
Just wanted to cross post this here as you may not have seen it yet, not sure if it helps:
To reference a folder by id you'll actually want:
/Items/{folder-id}
Where the {folder-id} of "folder.a5858c9cb698b77b.A5858C9CB698B77B!24220" is "A5858C9CB698B77B!24220"
Posted here: https://gist.github.com/rgregg/37ba8929768a62131e85
Oh, nice, haven't seen it. Thanks, I'll try this out.
Not sure though if it means that there should be no "/users/0x{id}/" at the start of the uri, or that there should be no filename after the {folder-id}
(but where else you'd specify it then?), or that "Items" must have that capital "I", or some combination of these, but should be easy to try stuff out, see if something might work.
That means DO NOT INCLUDE /Users/{Id}.
Heheh ;)
Just updated the OneDriveAPIWrapper.api_bits_url_by_id
and added folder-id mangling as suggested by @ificator in the gist comments, and uploads by folder-id seem to work now. Yay!
I have the chunked uploading working fine but I have one problem. I can't upload a zero length file. If I do just a start and a close call I get a "(416) requested range not satisfiable" error and if I try a start, and a zero length fragment I get "(400) Bad Request.".
@ajcsoftware
Yeah, I see same behavior as you described here as well. python-onedrive does not try to do "Range: bytes=0-0" chunk upload request and gets 416 error when trying to commit the upload session.
It's an obvious workaround of course, but for completeness' sake I want to note that you can upload zero-length files via normal (non-BITS) PUT/POST requests just fine, and python-onedrive kinda does that automatically if you use "put" (api or cli) with bits_api_fallback threshold greater than 0.
Don't think raising some special exception for zero-length files in "put_bits" method (of python-onedrive) is worth it, as it seem to be rather API's place to give proper error in such cases, if they aren't supported.
Also, as you seem to be talking about API issue in general (re-posting question from the gist), and not about how python-onedrive handles things, it might be worth mentioning here that this module is in no way "official" or affiliated with the service itself, so I can't really fix things in the API and have no influence (that I know of) over how/when Microsoft fixes these.
Yes I now upload zero length files using the normal method but even that throws an exception saying the request was cancelled but I ignore the error because the file has actually appeared on OneDrive.
Interestingly it looks like they don't actually support zero length files (which is crazy) because if you go to the OneDrive official web UI and try to upload one manually it says you can't.
Yes I have mentioned this problem elsewhere but I though you guys might be interested or come across the same problem. There is not much coverage of this on the net. Working with OneDrive for business is even worse! (another problem by the way is you can't create a folder starting with a period/dot even though you can through the web UI).
Yes I now upload zero length files using the normal method but even that throws an exception saying the request was cancelled
Seem to work fine for me, at least with uploads via PUT requests now, i.e. http 200 status, file gets uploaded, metadata on it shows "size: 0", so I guess you might be doing it somewhat differently.
I think they guy in the gist has a point that Stack Exchange sites might be a way better place for such coverage and general questions than some random project's github issues comment thread.
As there's like a few dozens of unrelated comments on this page already, and might be a hundred more, there's little to no hope anyone will find anything here (unless they're really desperate), while on e.g. Stack Overflow you'll get some relevant thing floating right on top of the first link in google, as I'm sure you're well aware.
This came from one of the OneDrive devs in a response on StackOverflow https://gist.github.com/rgregg/37ba8929768a62131e85