mk-fg / python-onedrive

Obsolete python/cli module for MS SkyDrive/OneDrive's old API, do not use for new projects
Do What The F*ck You Want To Public License
200 stars 32 forks source link

Support the new BITS API #34

Closed bobobo1618 closed 9 years ago

bobobo1618 commented 9 years ago

This came from one of the OneDrive devs in a response on StackOverflow https://gist.github.com/rgregg/37ba8929768a62131e85

mk-fg commented 9 years ago

To clarify:

BITS is a simple extension to HTTP that enables chunked file uploads to OneDrive.

Actually means not Transfer-Encoding: chunked, which is already supported and works with OneDrive, but rather uploading large files over several TCP/HTTP connections with each request only sending some byte-range from the original file.

Wish they didn't call it "chunked" in HTTP context like that, as it already means a different thing there.

kamudadreieck commented 9 years ago

Does that mean we can upload files >100MB finally?

The normal API doesnt accept bigger files after a few seconds at 8MB/s. :(

onedrive.api_v5.ProtocolError: (None, "('Connection aborted.', error(104, 'Connection reset by peer'))")

bobobo1618 commented 9 years ago

That seem to be the idea of BITS, yes.

kamudadreieck commented 9 years ago

It would be so awesome if it would be implemented in python-onedrive. :-)

Lyrrad commented 9 years ago

Yes, the new API appears to work to upload files up to the 10GB file size limit. I've been experimenting with it. So far, the largest file that I've uploaded through that API has been 2GB. I'm going to continue to experiment with it. My main issue has been getting errors that require me to start uploading again from the beginning. I assume it wouldn't be too difficult for someone to add this feature to python-onedrive.

mk-fg commented 9 years ago

My main issue has been getting errors that require me to start uploading again from the beginning.

The whole file, you mean, not just one of the chunks?

Because if not, I imagine you can easily split multi-GB file into 50 KiB chunks with no significant overhead and re-uploading these shouldn't be a problem. Though I didn't read into the doc to figure out if there're limits on chunk size/count.

mk-fg commented 9 years ago

Does that mean we can upload files >100MB finally?

16 has a related question, and indeed, that seem to be allowed via such APIs.

mk-fg commented 9 years ago

I assume it wouldn't be too difficult for someone to add this feature to python-onedrive.

Yeah, simple implementation can probably be one method with parameters like "chunk_size", "retries" and "timeout" that'd read/upload these chunks from a source file sequentially.

There can also be an implementation that'd store upload state in a persistent config file, allowing for e.g. upload resuming after app restart, plus exposing that "half-uploaded" state in the python api somehow.

Lyrrad commented 9 years ago

It sometimes thinks that a fragment was uploaded out of order or that a there has been some overlap with a previously uploaded fragment even though previous fragments were uploaded successfully. Some fragment errors require that the entire upload be restarted. I think I also get some other weird errors sometimes. I'm not sure if my account is subject to upload limits that are causing these errors. I'm not too experienced in writing this sort of program, though I'll probably tinker with my experimental upload code (unrelated to python-onedrive) some more over the next few days.

I'm unsure what the optimal chunk size should be. The stated max is 60mb, and I've successfully tried from anywhere from a few kb to 30mb or so. I suppose it may make sense to dynamically adjust based on network performance so that each chunk takes approximately the same amount of time to send.

mk-fg commented 9 years ago

I suppose it may make sense to dynamically adjust based on network performance so that each chunk takes approximately the same amount of time to send.

I imagine one can just grab some TCP window scaling algorithm verbatim and apply here, treating any failure as a "lost packet" ;)

rgregg commented 9 years ago

It'd be great to see support for this implemented in the python library. @mk-fg I've updated the documentation to stop using the term "chunked" or "chunk" to avoid confusion. Thanks for the feedback.

mk-fg commented 9 years ago

Added initial (simple) support for the thing in 7943435, but didn't get it to work so far:

Lyrrad commented 9 years ago

Here's what I did to get it to sort of work:

For Create Session packet:

headers = {'X-Http-Method-Override' : 'BITS_POST', 
    'Authorization': 'Bearer '+od_access_token, 
    'BITS-Packet-Type': 'Create-Session',
    'BITS-Supported-Protocols': '{7df0354d-249b-430f-820d-3d2a9bef4931}'}
    r = requests.post('https://cid-XXXXXXXXXXX.users.storage.live.com/users/0xXXXXXXXXXXXX/LiveFolders/Test_Upload/' +filename,  
        headers=headers) #hardcoded CID and test upload folder

To get session ID from Create Session response:

session_id = r.headers['bits-session-id']

For fragment:

headers = {'X-Http-Method-Override' : 'BITS_POST', 
    'Authorization': 'Bearer '+od_access_token, 
    'BITS-Packet-Type': 'Fragment',
    'BITS-Session-Id': session_id,
    'Content-Length': chunkSize,
    'Content-Range' : 'bytes '+str(chunkSize*x)+'-'+str(chunkSize*(x+1)-1)+'/'+str(totalSize),
}
r = requests.post('https://cid-XXXXXXXXXXX.users.storage.live.com/users/0xXXXXXXXXXXXX/LiveFolders/Test_Upload/'+filename, 
    headers=headers, data=data)

For close session:

headers = {'X-Http-Method-Override' : 'BITS_POST', 
    'Authorization': 'Bearer '+od_access_token, 
    'BITS-Packet-Type': 'Close-Session',
    'BITS-Session-Id': session_id,
    'Content-Length': '0'}
    r = requests.post('https://cid-XXXXXXXXXXX.users.storage.live.com/users/0xXXXXXXXXXXXX/LiveFolders/Test_Upload/'+filename, 
        headers=headers)

I occasionally receive some 416 (FragmentOutOfOrder or FragmentOverlap) or 503 (ServiceNotAvailable) errors, and I am unable to resume the file. I haven't had a chance to look into those any closer.

mk-fg commented 9 years ago

Thanks.

I've been able to spot at least off-by-one error in Content-Range of my implementation (and source gist, it seems). Also, I think you don't need to pass Content-Length headers explicitly like that, as requests will calculate and add them automatically from the passed data.

I wonder, have you tried using folder-id URLs (as the rest of the API does) instead of LiveFolders?

mk-fg commented 9 years ago

Mentioned http-400 error for folder-path uploads was due to that off-by-one error (which seem to also be present in the documentation example), thanks to @Lyrrad for helping me spot that.

Uploads via API seem to be working now, in general:

% ls -lah image.jpg
-rw-r--r-- 1 fraggod fraggod 5.3M Nov 23 06:08 image.jpg
% ./onedrive-cli --debug put -b --bits-frag-bytes 512000 image.jpg Pics
DEBUG:onedrive.api_v5:Using "requests" module version: '2.3.0'
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): apis.live.net
DEBUG:requests.packages.urllib3.connectionpool:"GET /v5.0/me?access_token=EwCAAq1D... HTTP/1.1" 200 100
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): cid-a3aXXX.users.storage.live.com
DEBUG:requests.packages.urllib3.connectionpool:"POST /users/0xa3aXXX/LiveFolders/Pics/image.jpg HTTP/1.1" 201 0
DEBUG:onedrive.api_v5:Uploading BITS fragment 1 / 11 (max-size: 0.49 MiB)
DEBUG:onedrive.api_v5:Uploading BITS fragment 2 / 11 (max-size: 0.49 MiB)
DEBUG:onedrive.api_v5:Uploading BITS fragment 3 / 11 (max-size: 0.49 MiB)
DEBUG:onedrive.api_v5:Uploading BITS fragment 4 / 11 (max-size: 0.49 MiB)
DEBUG:onedrive.api_v5:Uploading BITS fragment 5 / 11 (max-size: 0.49 MiB)
DEBUG:onedrive.api_v5:Uploading BITS fragment 6 / 11 (max-size: 0.49 MiB)
DEBUG:onedrive.api_v5:Uploading BITS fragment 7 / 11 (max-size: 0.49 MiB)
DEBUG:onedrive.api_v5:Uploading BITS fragment 8 / 11 (max-size: 0.49 MiB)
DEBUG:onedrive.api_v5:Uploading BITS fragment 9 / 11 (max-size: 0.49 MiB)
DEBUG:onedrive.api_v5:Uploading BITS fragment 10 / 11 (max-size: 0.49 MiB)
WARNING:requests.packages.urllib3.connectionpool:Connection pool is full, discarding connection: cid-a3aXXX.users.storage.live.com
DEBUG:onedrive.api_v5:Uploading BITS fragment 11 / 11 (max-size: 0.49 MiB)
WARNING:requests.packages.urllib3.connectionpool:Connection pool is full, discarding connection: cid-a3aXXX.users.storage.live.com
DEBUG:requests.packages.urllib3.connectionpool:"POST /users/0xa3aXXX/LiveFolders/Pics/image.jpg HTTP/1.1" 200 0
DEBUG:root:Call result:
--------------------
...(lots of metadata)...
--------------------

(will be silent without --debug)

API-wise, "put" method now has "bits_api_fallback" option:

put(path_or_tuple, folder_id='me/skydrive', overwrite=None, downsize=None, bits_api_fallback=True)

...which can be True/False or max (non-BITS) file size (default - 95 MiB). And there's also "put_bits" method.

Limitations/hacks:

Thanks again to everyone for the feedback.

mk-fg commented 9 years ago

Note on the output in previous msg: no idea about these warnings from "requests" - it doesn't even open new connections there (reusing same one for all BITS requests).

KarmaPoliceT2 commented 9 years ago

Just wanted to cross post this here as you may not have seen it yet, not sure if it helps:

To reference a folder by id you'll actually want:

/Items/{folder-id}

Where the {folder-id} of "folder.a5858c9cb698b77b.A5858C9CB698B77B!24220" is "A5858C9CB698B77B!24220"

Posted here: https://gist.github.com/rgregg/37ba8929768a62131e85

mk-fg commented 9 years ago

Oh, nice, haven't seen it. Thanks, I'll try this out.

Not sure though if it means that there should be no "/users/0x{id}/" at the start of the uri, or that there should be no filename after the {folder-id} (but where else you'd specify it then?), or that "Items" must have that capital "I", or some combination of these, but should be easy to try stuff out, see if something might work.

mk-fg commented 9 years ago

That means DO NOT INCLUDE /Users/{Id}.

Heheh ;)

Just updated the OneDriveAPIWrapper.api_bits_url_by_id and added folder-id mangling as suggested by @ificator in the gist comments, and uploads by folder-id seem to work now. Yay!

ajcsoftware commented 9 years ago

I have the chunked uploading working fine but I have one problem. I can't upload a zero length file. If I do just a start and a close call I get a "(416) requested range not satisfiable" error and if I try a start, and a zero length fragment I get "(400) Bad Request.".

mk-fg commented 9 years ago

@ajcsoftware

Yeah, I see same behavior as you described here as well. python-onedrive does not try to do "Range: bytes=0-0" chunk upload request and gets 416 error when trying to commit the upload session.

It's an obvious workaround of course, but for completeness' sake I want to note that you can upload zero-length files via normal (non-BITS) PUT/POST requests just fine, and python-onedrive kinda does that automatically if you use "put" (api or cli) with bits_api_fallback threshold greater than 0.

Don't think raising some special exception for zero-length files in "put_bits" method (of python-onedrive) is worth it, as it seem to be rather API's place to give proper error in such cases, if they aren't supported.

Also, as you seem to be talking about API issue in general (re-posting question from the gist), and not about how python-onedrive handles things, it might be worth mentioning here that this module is in no way "official" or affiliated with the service itself, so I can't really fix things in the API and have no influence (that I know of) over how/when Microsoft fixes these.

ajcsoftware commented 9 years ago

Yes I now upload zero length files using the normal method but even that throws an exception saying the request was cancelled but I ignore the error because the file has actually appeared on OneDrive.

Interestingly it looks like they don't actually support zero length files (which is crazy) because if you go to the OneDrive official web UI and try to upload one manually it says you can't.

Yes I have mentioned this problem elsewhere but I though you guys might be interested or come across the same problem. There is not much coverage of this on the net. Working with OneDrive for business is even worse! (another problem by the way is you can't create a folder starting with a period/dot even though you can through the web UI).

mk-fg commented 9 years ago

Yes I now upload zero length files using the normal method but even that throws an exception saying the request was cancelled

Seem to work fine for me, at least with uploads via PUT requests now, i.e. http 200 status, file gets uploaded, metadata on it shows "size: 0", so I guess you might be doing it somewhat differently.


I think they guy in the gist has a point that Stack Exchange sites might be a way better place for such coverage and general questions than some random project's github issues comment thread.

As there's like a few dozens of unrelated comments on this page already, and might be a hundred more, there's little to no hope anyone will find anything here (unless they're really desperate), while on e.g. Stack Overflow you'll get some relevant thing floating right on top of the first link in google, as I'm sure you're well aware.