ranaroussi / pystore

Fast data store for Pandas time-series data
Apache License 2.0
562 stars 101 forks source link

unable to set npartitions when writing collection #15

Closed cevans3098 closed 5 years ago

cevans3098 commented 5 years ago

I was hoping to set the number of partitions while writing a collection, but the code currently provides at error when executing:

    def write(self, item, data, metadata={},
              npartitions=None, chunksize=1e6, overwrite=False,
              epochdate=False, compression="snappy", **kwargs):

In the following example I tried to execute the following code

collection.write(item='spx', data=df, metadata={'source': 'ivolatility'}, npartitions=5, overwrite=True)

The error I receive is:

Traceback (most recent call last):
  File "C:\Users\C\AppData\Local\Temporary Projects\pystore_Test\pystore_Test.py", line 30, in <module>
    collection.write(item='spx', data=df, metadata={'source': 'ivolatility'},  npartitions=5, overwrite=True)
  File "F:\anaconda3\envs\envTensorflow\lib\site-packages\pystore\collection.py", line 111, in write
    chunksize=int(chunksize))
  File "F:\anaconda3\envs\envTensorflow\lib\site-packages\dask\dataframe\io\io.py", line 177, in from_pandas
    raise ValueError('Exactly one of npartitions and chunksize must be specified.')
ValueError: Exactly one of npartitions and chunksize must be specified.

I tried a couple different ways to apply partitions -vs- chunksize, but because the chunksize is specified I can't seem to find an easy way to execute the code using partitions

any suggestions on how to utilize partitions over chunksize?

ranaroussi commented 5 years ago

You're correct! I've fixed the issue in the recent version and you can now specify either npartitions or chunksize.