ranaroussi / pystore

Fast data store for Pandas time-series data
Apache License 2.0
565 stars 101 forks source link

Q: Fastest way to load multiple DataFrames same time #41

Open cgi1 opened 4 years ago

cgi1 commented 4 years ago

Awesome project.

Just a short question, I have like 2000 stored dataframes now and I would like to load 500 of it as fast as possible into one python process. Is there a batch-load function in it?

I coded something with ThreadPoolExecutor and it loads 3GB on disk into around a 40GB DataFrame (which is pretty heavy) in under four minutes using 5 threads.

Does somebody see a faster variant? The SSD is relaxed, it looks like the performance limiation lies in df = item.to_pandas(), which is CPU intensive.