ranaroussi / pystore

Fast data store for Pandas time-series data
Apache License 2.0
562 stars 101 forks source link

Reading data via date range #2

Closed trbck closed 6 years ago

trbck commented 6 years ago

Very nice and fast pandas dataframe database. Thank you!

You think it would somehow be possible to partially read data via applied date range to prevent getting the whole dataframe at once?

ranaroussi commented 6 years ago

Hey!

You don't really get the dataframe until you call the .to_pandas() method...

The way Dask works is by storing pointers to the data and not processing it until you call the .compute() method.

Here's an example:


item = collection.item('AAPL')

# item.data = holds the dask dataframe
# item.metadata = holds the metadata

# convert the entire item to pandas:
df = item.to_pandas()  # same as item.data.compute()

# to get only part of the data as a Pandas dataframe, use:
df = item.data.loc['2017-01-01':'2017-12-31'].compute()

# you can also filter by other methods:
df = item.data[item.data['close'] > item.data['open']].compute()

# you can also delegate calculations to Dask: 
item.data['sma'] = item.data['close'].rolling(200).mean()
df = item.data.loc['2017-01-01':'2017-12-31'].compute()

I hope that helps :)

trbck commented 6 years ago

Thanks!