ranaroussi / pystore

Fast data store for Pandas time-series data
Apache License 2.0
562 stars 101 forks source link

Terrible performance on dask=2.2.0 #22

Closed marchinidavide closed 5 years ago

marchinidavide commented 5 years ago

Hi everyone, I noticed that pystore unexpectedly started being orders of magnitude slower (running the same script), like minutes to get daily timestamp of a single instrument ... It seems that dask=2.2.0 is terribly slow using snappy and engine="fastparquet" but everything is great again downgrading to dask=2.1.0.

@ranaroussi it would be great if you could update us on when things will be fin with latest Dask update :)

Best, Davide

stnatter commented 5 years ago

Does append work for you when using dask==2.2.0?

marchinidavide commented 5 years ago

I had to downgrade to 2.1.0 as with the latest one every read operation is very slow. Didn't try anything than reading with 2.2.0.

ranaroussi commented 5 years ago

I've pushed a new version to the dev branch. It should result in a faster and more consistent behavior when using append. By default, PyStore will aim for partitions of ~99MB each (as per Dask's recommendation).

LMK.

ranaroussi commented 5 years ago

Closing this issue and moving all related discussions to issue https://github.com/ranaroussi/pystore/issues/21.

Please see my comments here: https://github.com/ranaroussi/pystore/issues/21#issuecomment-523467142, and here: https://github.com/ranaroussi/pystore/issues/21#issuecomment-523833921