roeap / object-store-python

Python bindings and arrow integration for the rust object_store crate.
Apache License 2.0
45 stars 8 forks source link

Allow threads #15

Open adriangb opened 2 months ago

adriangb commented 2 months ago

This should release the GIL and allow use in multiple threads.

I tested with this script:

from contextlib import contextmanager
import math
from time import time
from typing import Iterator

import anyio
import anyio.to_process
import anyio.to_thread
import object_store

@contextmanager
def timeit(name: str) -> Iterator[None]:
    start = time()
    yield
    print(f'{name} took {time() - start:.2f} seconds', flush=True)

def work() -> None:
    object_store.ObjectStore('gs://adriangb-public-bucket').get('yellow_tripdata_2024-01 (1).parquet')

async def awork(limiter: anyio.CapacityLimiter) -> None:
    await anyio.to_thread.run_sync(work, limiter=limiter)

async def main() -> None:
    limiter = anyio.CapacityLimiter(math.inf)

    with timeit('main'):
        async with anyio.create_task_group() as tg:
            for _ in range(32):
                tg.start_soon(awork, limiter)

if __name__ == '__main__':
    anyio.run(main)

Locally there isn't much difference, I'm IO bound. But on GCP compute that goes from ~15s to ~3s for me.

adriangb commented 1 month ago

@roeap quick ping on this

roeap commented 2 weeks ago

@adriangb - sorry for being MIA for so long. Do you mind rebasing, and I am happy to review / merge then.

ion-elgreco commented 2 weeks ago

@roeap will you push a 0.2.0 release after this one?