ranaroussi / pystore

Fast data store for Pandas time-series data
Apache License 2.0
556 stars 99 forks source link

.to_pandas() error [can't read parquet file even though there is data in it when i look with parquet viewer] #59

Closed davidt35 closed 2 years ago

davidt35 commented 2 years ago

I reinstalled my computer and i can't read my old pystore data as getting error when i get to the last linedf = item.to_pandas()

i even used the example pystore code, i just using yfinance instead of quandl and still still can't reat the data once created, even though i can see it clearly with parquet viewer. See the image here: https://i.imgur.com/ykKy5cn.png

i'm getting this error:

Traceback (most recent call last):
  File "C:\Users\tothd\anaconda3\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\tothd\anaconda3\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "c:\Users\tothd\.vscode\extensions\ms-python.python-2022.2.1924087327\pythonFiles\lib\python\debugpy\__main__.py", line 45, in <module>
    cli.main()
  File "c:\Users\tothd\.vscode\extensions\ms-python.python-2022.2.1924087327\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 444, in main
    run()
  File "c:\Users\tothd\.vscode\extensions\ms-python.python-2022.2.1924087327\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 285, in run_file
    runpy.run_path(target_as_str, run_name=compat.force_str("__main__"))
  File "C:\Users\tothd\anaconda3\lib\runpy.py", line 268, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "C:\Users\tothd\anaconda3\lib\runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "C:\Users\tothd\anaconda3\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "c:\Users\tothd\OneDrive\Desktop\Developers\0_CODE\1_algo_trading\active\data_storage\yfinance_test.py", line 34, in <module>
    df = item.to_pandas()
  File "C:\Users\tothd\anaconda3\lib\site-packages\pystore\item.py", line 71, in to_pandas
    elif df.index.values[0] > 1e6:
IndexError: index 0 is out of bounds for axis 0 with size 0
Press any key to continue . . .

Here is pystore .to_pandas() function:

    def to_pandas(self, parse_dates=True):
        df = self.data.compute()

        if parse_dates and "datetime" not in str(df.index.dtype):
            df.index.name = ""
            if str(df.index.dtype) == "float64":
                df.index = pd.to_datetime(df.index, unit="s",
                                          infer_datetime_format=True)
            elif df.index.values[0] > 1e6:
                df.index = pd.to_datetime(df.index,
                                          infer_datetime_format=True)

        return df

What i am missing? Before i reinstalled my computer it worked without any problems and i used pystore all the time. Here is the code i used as example, i just changed 2 lines of code using yfinance insteand of quandl, with quandl data it was doing the exact same error.


import pystore
import yfinance as yf

# Set storage path (optional)
# Defaults to `~/pystore` or `PYSTORE_PATH` environment variable (if set)
pystore.set_path(r'C:\Users\tothd\OneDrive\Desktop')

# List stores
print(pystore.list_stores())

# Connect to datastore (create it if not exist)
store = pystore.store('mydatastore')

# List existing collections
print(store.list_collections())

# Access a collection (create it if not exist)
collection = store.collection('NASDAQ')

# List items in collection
print(collection.list_items())

# Load some data from yfinance these 2 lines are the only ones i edited
msft = yf.Ticker("AAPL")
aapl = msft.history(period="max")

# Store the first 100 rows of the data in the collection under "AAPL"
collection.write('AAPL', aapl[:100], metadata={'source': 'yfinance'})

# Reading the item's data
item = collection.item('AAPL')
data = item.data  # <-- Dask dataframe (see dask.pydata.org)
metadata = item.metadata
df = item.to_pandas()

# Append the rest of the rows to the "AAPL" item
collection.append('AAPL', aapl[100:])

# Reading the item's data
item = collection.item('AAPL')
data = item.data
metadata = item.metadata
df = item.to_pandas()
davidt35 commented 2 years ago

ditched pystore for pandas as it can do easily fastparquet wtih snappy compression with 1 line of code since 2017 and it's better maintained/ more active.