man-group / arctic

High performance datastore for time series and tick data
https://arctic.readthedocs.io/en/latest/
GNU Lesser General Public License v2.1
3.05k stars 583 forks source link

ChunkStore date index timezone #879

Open DataCT2020 opened 3 years ago

DataCT2020 commented 3 years ago

Arctic Version

1.79.3

Arctic Store

ChunkStore 

Platform and version

window 10 Enterprise

Description of problem and/or code sample that reproduces the issue

Hi we are receiving historical trade data in csv file(file format is show below). I am trying to store these data using "ChunkStore" with value "D" using "append" api. File has "Date" field is in UTC value (e.g 2020-08-03T09:23:46.530859148Z) . so when I store the data I set the data frame 'date' index timezone as 'UTC' but it appears it is causing problem when I try to append same data or other data for same date again so my confusion is 1) should I set the timezone for 'date' index field? 2) If I don't set the timezone for 'date' index field , what timezone arctic library uses when it store data in mongodb?

Thanks ** Code ***** def load_trth_trades_arctic(): lib=None try:

    filename="c:\\temp\\TEST01_Trades_modified.csv"       
    fname=Path(filename)     

    mongourl="mongodb://myMongo:27017"
    with pymongo.MongoClient(mongourl) as client:

        use_cols=['#RIC','Date','Price','Volume']
        rename_cols=['ric','date','price','volume']

        csv_df=pd.read_csv(
            fname,
            header=0,   

            usecols=use_cols,
            dtype={'#RIC':np.str,'Date':np.str,'Price':np.float,'Volume':np.int},
            error_bad_lines=False,                            
            float_precision='round_trip'            
        )[use_cols]
        #rename columns here
        csv_df.columns=rename_cols
        df=csv_df
        dfExchDate=pd.to_datetime(df['date'].str.slice(0,23)+'Z',format='%Y-%m-%dT%H:%M:%S.%fZ') 
        df['date']=dfExchDate

        **# IF I ENABLE FOLLOWING LINE then I can't append same data multiple time
        #df['date']=df['date'].dt.tz_localize(timezone.utc)**

        df.set_index('date',inplace=True)
        ArcticLibraryBinding.DB_PREFIX = "dev_vendor_db"
        arctic=Arctic(client,app_name="arctic_trade_test")
        arctic_library_name="arctic_trade_test"
        arctic.initialize_library(
            "dev_vendor_db.%s" % (arctic_library_name), lib_type=CHUNK_STORE
        )                    
        lib=arctic[arctic_library_name]
        lib.append('TEST1',df,chunk_size='D',upsert=True)

except:
    traceback.print_exc()
    print("Unexpected error: {}".format(sys.exc_info()[0]))

***** Data File **

RIC,Date,Price,Volume

TEST1,2020-08-03T09:23:46.530859148Z,0.6,1 TEST1,2020-08-03T11:21:59.914308825Z,0.65,1 TEST1,2020-08-03T13:37:14.526444511Z,0.6,1 TEST1,2020-08-04T07:49:48.555836588Z,0.6,1 TEST1,2020-08-04T07:51:52.612788790Z,0.6,1 TEST1,2020-08-04T07:51:52.617832144Z,0.6,1 TEST1,2020-08-04T07:53:01.925872675Z,0.7,1 TEST1,2020-08-04T07:54:49.673764636Z,0.6,1 TEST1,2020-08-04T08:04:20.398587575Z,0.7,1 TEST1,2020-08-04T08:05:27.762499934Z,0.6,1 TEST1,2020-08-04T08:24:33.830431316Z,0.7,1 TEST1,2020-08-04T09:09:02.074908684Z,0.7,1 TEST1,2020-08-04T12:23:08.100989319Z,0.7,1