Just a short question, I have like 2000 stored dataframes now and I would like to load 500 of it as fast as possible into one python process. Is there a batch-load function in it?
I coded something with ThreadPoolExecutor and it loads 3GB on disk into around a 40GB DataFrame (which is pretty heavy) in under four minutes using 5 threads.
Does somebody see a faster variant? The SSD is relaxed, it looks like the performance limiation lies in df = item.to_pandas(), which is CPU intensive.
Awesome project.
Just a short question, I have like 2000 stored dataframes now and I would like to load 500 of it as fast as possible into one python process. Is there a batch-load function in it?
I coded something with
ThreadPoolExecutor
and it loads 3GB on disk into around a 40GB DataFrame (which is pretty heavy) in under four minutes using 5 threads.Does somebody see a faster variant? The SSD is relaxed, it looks like the performance limiation lies in
df = item.to_pandas()
, which is CPU intensive.