tomgross / pcloud

A Python implementation of the pCloud API
MIT License
95 stars 28 forks source link

Multiple requests over single connection without waiting for answer #31

Closed qo4on closed 4 years ago

qo4on commented 4 years ago

Is it possible for pycloud to download multiple files using HTTP pipelining? https://docs.pcloud.com/protocols/http_json_protocol/single_connection.html

tomgross commented 4 years ago

Yes, pycloud uses the request.Session, which sends the Connection: keep-alive header by default.

qo4on commented 4 years ago

Are you talking about sending requests in multiple threads? It doesn't work properly, for example, "file_open" returns the same "fd" number for different files. Even when pycloud sends the Connection: keep-alive it is waiting for a response after each get request. Can it send a group of requests and then gather all responses at once?

tomgross commented 4 years ago

I see 3 options:

  1. Implement that yourself, using file_pread and threading.
  2. Using downloadfileasync https://docs.pcloud.com/methods/file/downloadfileasync.html This has to be implemented in pycloud.
  3. Using the getzip-method, which downloads a full tree in one connection. This is not exactly what your looking for, but still might be interesting.
qo4on commented 4 years ago
  1. pcloud suggests to use single connection and does not support threading in single connection. I tried this, it returns incorrect responses. However you should make sure that in no event two threads/processes write to the same connection at the same time.

  2. downloadfileasync downloads a file from any url to pcloud drive. I'm talking about downloading from pcloud to my computer.

  3. It's interesting, but as you said not exactly multiple downloading.

I also tried async calls in one thread and one connection with dugong. Sometimes it works, but then file_open freezes and eventually returns an error 5001.

tomgross commented 4 years ago

Have you tried opening a list of files in the main thread and reading the files in separate threads accessing them with negative file descriptors. This is how I interpret this:

You can refer to the last opened file by descriptor -1, the one opened before that -2, etc.. That is useful if you want to pipeline open request together with read/write requests without waiting for the answer.

qo4on commented 4 years ago

Have you tried opening a list of files in the main thread

This returns fd = 1 for each file.

I had a success with async calls, it's better than threads in this case. I managed to get 60 different fd's in less than 0.7 second. You can test it in Colab and add to pycloud

!pip install -q git+https://github.com/python-dugong/python-dugong.git > /dev/null
!pip install -q git+https://github.com/tomgross/pycloud.git > /dev/null
!pip install -q -U nest-asyncio uvloop > /dev/null

import nest_asyncio   # for IPython
nest_asyncio.apply()  # for IPython
import sys
import ssl
import time
import json
import asyncio
from pcloud import PyCloud
from natsort import natsorted
from urllib.parse import urlparse, urlencode
from dugong import HTTPConnection, AioFuture

def ts(start=0):
    return round(time.monotonic() - start, 3)

def prepare_files(items):
    files = []

    for item in items:
        if all([
            isinstance(item, dict),
            'isfolder' in item,
            item['isfolder'] is False,
            int(item.get('size', 0)) > 0
        ]):
            files.append({
                'name': item['name'],
                'fileid': item['fileid'],
                'size': item['size'],
                'method': 'file_open',
                'params': {
                    'auth': pc.auth_token,
                    'path': f"/{item['name']}",
                    'flags': pc.O_CREAT
                }
            })
    return files

class PC(PyCloud):
    endpoint = "https://eapi.pcloud.com/"
    O_CREAT = int("0x0040", 16)

    def __init__(self, username, password):
        super().__init__(username, password)

if sys.platform == 'win32':
    asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
else:
    import uvloop
    uvloop.install()

pc = PC('atisek@yandex.com', 'sd7UY3t5m')

ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT)
ssl_context.verify_mode = ssl.CERT_REQUIRED
ssl_context.check_hostname = True
ssl_context.load_default_certs()

items = pc.listfolder(folderid=0)['metadata']['contents']
items = natsorted(items, key=lambda dic: dic['name'])
path_list = prepare_files(items)

start = time.monotonic()
conn = HTTPConnection(urlparse(pc.endpoint).netloc, ssl_context=ssl_context)
for _ in range(20):
    if asyncio.get_event_loop().is_closed():
        asyncio.set_event_loop(asyncio.new_event_loop())
    loop = asyncio.get_event_loop()

    # This generator function returns a coroutine that sends
    # all the requests.
    def send_requests():
        for path in path_list:
            yield from conn.co_send_request(
                'GET', f"/{path['method']}?" + urlencode(path['params']))

    # This generator function returns a coroutine that reads
    # all the responses
    def read_responses():
        bodies = []
        for path in path_list:
            resp = yield from conn.co_read_response()
            assert resp.status == 200
            buf = yield from conn.co_readall()
            bodies.append(buf)
        return bodies

    # Create the coroutines
    send_crt = send_requests()
    recv_crt = read_responses()

    # Register the coroutines with the event loop
    AioFuture(send_crt, loop=loop)
    recv_future = AioFuture(recv_crt, loop=loop)

    # Run the event loop until the receive coroutine is done (which
    # implies that all the requests must have been sent as well):
    loop.run_until_complete(recv_future)

    # Get the result returned by the coroutine
    bodies = recv_future.result()

    for r in bodies:
        print(json.loads(r))

conn.disconnect()
print(f'{ts(start)}\tDone!')
pc.logout()

{'result': 0, 'fd': 1, 'fileid': 128105066}
{'result': 0, 'fd': 2, 'fileid': 128105067}
...
{'result': 0, 'fd': 1199, 'fileid': 128105283}
{'result': 0, 'fd': 1200, 'fileid': 128105285}
8.311   Done!

But sometimes when I run this code several times, it returns errors or freezes:

{'result': 0, 'fd': 56, 'fileid': 128105273}
{'result': 5001, 'error': 'Internal upload error.'}
{'result': 0, 'fd': 58, 'fileid': 128105279}
{'result': 5001, 'error': 'Internal upload error.'}
{'result': 0, 'fd': 60, 'fileid': 128105285}

This looks like a bug of http-pipeline implementation. I don't know how to get reliable responses until they shift to http2.