Open yohplala opened 4 years ago
Hello, I have updated the code so that anyone can execute it in a terminal and can reproduce the error (previous code was not working on its own, it needed a data file. I have made an extract that I have embedded in the code) Thanks in advance for any help and advice. Bests, Pierrot
[ADDITION] Ok, I tested 1st the use of pandas concat() function (not using pystore). I don't have the error message. It would mean that the trouble is coming from dask dataframe handling?
Following code (direct use of pandas, not pystore/dask/parquet) works:
import os
import pandas as pd
ts_list = ['Sun Dec 22 2019 07:40:00 GMT-0100',
'Sun Dec 22 2019 07:45:00 GMT-0100',
'Sun Dec 22 2019 07:50:00 GMT-0100',
'Sun Dec 22 2019 07:55:00 GMT-0100']
op_list = [7134.0, 7134.34, 7135.03, 7131.74]
GC = pd.DataFrame(list(zip(ts_list, op_list)), columns =['date', 'open'])
# Getting timestamps back into GC, and resolving it to UTC time
GC['date'] = pd.to_datetime(GC['date'], utc=True)
# Rename columns
GC.rename(columns={'date': 'Timestamp'}, inplace=True)
# Set timestamp column as index
GC.set_index('Timestamp', inplace = True, verify_integrity = True)
combined = pd.concat([GC[:-1], GC[-1:]]).drop_duplicates(keep="last")
Problem is not solved.
Hmm, it seems I don't succeed to reproduce the error in a script without having to re-write in depth collection.py. I am stopping the delving here (it seemed to me, it could be an error with my dataframe formatting maybe, that I could then submit either in stackoverflow or pandas github or dask if it was dask related) But I have no clue where the bug is without going further into dask.
As this is not my priority at the moment, I will only use the write() funciton of pystore, and when I will have to append data, I will do it with pandas concat() function, then write() with pystore using overwrite=True.
I hope this trouble in Windows 10 environment can be solved (I am hinting that this error, along with having to use 'npartitions=item.data.npartitions' in append() function may actually be linked)
Have a good day, Bests, Pierrot
For those who are in the same case, here is an ugly workaround which logic I mention in above comment.
import os
import pandas as pd
import pystore
ts_list = ['Sun Dec 22 2019 07:40:00 GMT-0100',
'Sun Dec 22 2019 07:45:00 GMT-0100',
'Sun Dec 22 2019 07:50:00 GMT-0100',
'Sun Dec 22 2019 07:55:00 GMT-0100']
op_list = [7134.0, 7134.34, 7135.03, 7131.74]
GC = pd.DataFrame(list(zip(ts_list, op_list)), columns =['date', 'open'])
# Getting timestamps back into GC, and resolving it to UTC time
GC['date'] = pd.to_datetime(GC['date'], utc=True)
# Rename columns
GC.rename(columns={'date': 'Timestamp'}, inplace=True)
# Set timestamp column as index
GC.set_index('Timestamp', inplace = True, verify_integrity = True)
# Connect to datastore (create it if not exist)
store = pystore.store('OHLCV')
# Access a collection (create it if not exist)
collection = store.collection('AAPL')
item_ID = 'EOD'
collection.write(item_ID, GC[:-1], overwrite=True)
# WORKAROUND
# Re-create an append function
item = collection.item(item_ID)
current = item.to_pandas()
combined = pd.concat([current, GC[-1:]]).drop_duplicates(keep="last")
collection.write(item_ID, combined, overwrite=True)
Bests,
I think that https://github.com/ranaroussi/pystore/blob/master/pystore/collection.py#L181 should
combined = dd.concat([current.to_pandas(), new]).drop_duplicates(keep="last")
instead of currently
combined = dd.concat([current.data, new]).drop_duplicates(keep="last")
@ranaroussi could you confirm ?
probably related to issue https://github.com/dask/dask/issues/6925
Hello,
I am passing a tz-aware dataframe to pystore/append, and I get this error message.
[EDIT] Here is a code that can be simply copy/past to reproduce the error message. Please, does someone sees what I can be possibly doing wrong?
I thank you for your help. Have a good day, Bests, Pierrot