ranaroussi / pystore

Fast data store for Pandas time-series data
Apache License 2.0
562 stars 101 forks source link

fix in case of store replacing #1

Closed javifalces closed 6 years ago

javifalces commented 6 years ago

Hi Ran , today i have seen this issue trying to create/replace a collection is searching for the collection name but list of collections were getting the subdirs of datastore , at least in my side seems to be fixed.

I discover the error in the method def collection ,but it was coming from the list_collection method

Thanks for your library and your time!

ranaroussi commented 6 years ago

Hi :)

I'm not sure I understand the problem you were facing... Were you trying to rename a collection or a dataset?

Can you please provide an example of what it is you were trying to accomplish?

Thanks.

javifalces commented 6 years ago

yes just adding more data to a already existing collection , i have a collection of daily bars , if i want to add new bars the function is thrwing an exeption already exists , but not returning the collection in the method collection , so only option could have was to force and delete previous data

ranaroussi commented 6 years ago

Can you post the code you were using? I’m trying to replicate the error on my end but everything’s working as it should... :)

javifalces commented 6 years ago

its my first pull request to a github so sorry if its not okey formatted... im using windows 10 , maybe there is a problem with os separators or paths

i replicate it doing this just create a collection two times...the second time is showing an error if they already exists...so i can append now data to that collection(only overwriting)

import pystore
pystore.set_path('C:\sample_database')
store = pystore.store('store')
collection = store.collection('collection2', overwrite=False) #create first time => OKEY
collection2 = store.collection('collection2', overwrite=False) #shows error
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-e368bc4f1af7> in <module>()
----> 1 collection = store.collection('collection2', overwrite=False)

D:\Anaconda3\lib\site-packages\pystore\store.py in collection(self, collection, overwrite)
     82 
     83         # create it
---> 84         self._create_collection(collection, overwrite)
     85         return Collection(collection, self.datastore)

D:\Anaconda3\lib\site-packages\pystore\store.py in _create_collection(self, collection, overwrite)
     52             else:
     53                 raise ValueError(
---> 54                     "Collection already exists. To overwrite, use `overwrite=True`")
     55 
     56         os.makedirs(self.datastore + '/' + collection)

ValueError: Collection already exists. To overwrite, use `overwrite=True`
ranaroussi commented 6 years ago

Ok... now I understand!

If you want to override the entire collection, you'll need to either delete and re-created it, or use overwrite=True like the error suggests. If you'll notice the error message reads: Collection already exists. To overwrite, use 'overwrite=True'.

So... your code should look something like this:

import pystore

pystore.set_path('C:\sample_database')
store = pystore.store('store')

# create the collection
collection = store.collection('collection2')

# delete the collection
store.delete_collection('collection2')

# re-create the collection
collection2 = store.collection('collection2')

Or (shorter):

import pystore

pystore.set_path('C:\sample_database')
store = pystore.store('store')

# create the collection
collection = store.collection('collection2')

# overwrite the collection
collection2 = store.collection('collection2', overwrite=True)
javifalces commented 6 years ago

but what if i have a project with a collection called for examples 1_day_bar and i have there som dataseries of stocks SP500 and NASDAQ but tomorrow i want to add to the same collection 1_day_bar another timeseries, of DAX...do i have to replace or delete the whole 1_day_bar collection?

ranaroussi commented 6 years ago

Think of collections as a directory hierarchy.

Here's an example of what you're trying to do:

store (datastore)
  - SP500.EOD
    - STOCK
    - STOCK
    - ...
  - NASDAQ.EOD
    - STOCK
    - STOCK

Python code:

import pystore

pystore.set_path('C:\sample_database')
store = pystore.store('store')

collection = store.collection('SP500.EOD')
collection.write('STOCK1', stock1_df, metadata={'source': '...'})
collection.write('STOCK2', stock2_df, metadata={'source': '...'})

collection = store.collection('NASDAQ.EOD')
collection.write('STOCK1', stock1_df, metadata={'source': '...'})
collection.write('STOCK2', stock2_df, metadata={'source': '...'})

Tomorrow, you may want to add new data to SP500's STOCK1...

import pystore

pystore.set_path('C:\sample_database')
store = pystore.store('store')

collection = store.collection('SP500.EOD')
collection.append('STOCK1', stock1_df_new, metadata={'source': '...'})

# if you want to completely overwrite STOCK1, use:
collection.write('STOCK1', stock1_df_new, metadata={'source': '...'}, overwrite=True)

A day later, you want to add DAX data...

import pystore

pystore.set_path('C:\sample_database')
store = pystore.store('store')

collection = store.collection('DAX.EOD')
collection.write('STOCK1', stock1_df, metadata={'source': '...'})
collection.write('STOCK2', stock2_df, metadata={'source': '...'})

If you then want to have collections for 1-minute data, use something like this:

import pystore

pystore.set_path('C:\sample_database')
store = pystore.store('store')

collection = store.collection('DAX.1MIN')
collection.write('STOCK1', stock1_df, metadata={'source': '...'})
collection.write('STOCK2', stock2_df, metadata={'source': '...'})

I hope that helps :)

Read more about the concept here: https://github.com/ranaroussi/pystore#concepts

javifalces commented 6 years ago

Im getting the error, exactly at that point ....

collection = store.collection('SP500.EOD')

Just to be sure ,are you testing on windows??

Tomorrow, you may want to add new data to SP500's STOCK1...

import pystore

pystore.set_path('C:\sample_database')
store = pystore.store('store')

collection = store.collection('SP500.EOD')#error here!
collection.append('STOCK1', stock1_df_new, metadata={'source': '...'})

# if you want to completely overwrite STOCK1, use:
collection.write('STOCK1', stock1_df_new, metadata={'source': '...'}, overwrite=True)
ranaroussi commented 6 years ago

I've never tested it on Windows platforms:

PyStore was tested to work on *NIX-like systems

That being said, I've updated the library to support platform-independent paths. Please upgrade to version 0.0.12 and see if this helps.

Upgrade using:

$ pip install PyStore --upgrade --no-cache-dir

LMK