man-group / arctic

High performance datastore for time series and tick data
https://arctic.readthedocs.io/en/latest/
GNU Lesser General Public License v2.1
3.06k stars 583 forks source link

Behaviour of tickstore.delete #859

Open soulaw-mkii opened 4 years ago

soulaw-mkii commented 4 years ago

Arctic Version

1.79.4

Arctic Store

TickStore

Platform and version

MacOS Catalina 10.15.4

Description of problem and/or code sample that reproduces the issue

I tried to clean up the ticks which are duplicated accidentally via tickstore.delete(). I expected to delete all entries under the DateRange but eventually one entry will be leftover regardless my delete script.

Could you please tell me if it's expected design or a bug?

Regards, Steve

Sample code:

ts_name = 'test_tickstore'
arctic.delete_library(ts_name)
arctic.initialize_library(ts_name, TICK_STORE)
tickstore = arctic[ts_name]

duplicate_ts = dt(2020, 4, 24, 15, 30, 39, tzinfo=mktz('UTC'))
df = DataFrame(   
         data  = {'price': [108.193, 110.193, 111.193, 112.193]}
      , index  =  [duplicate_ts
                   , dt(2020, 4, 24, 15, 30, 41, tzinfo=mktz('UTC'))
                   , dt(2020, 4, 24, 15, 30, 43, tzinfo=mktz('UTC'))
                   , dt(2020, 4, 24, 15, 30, 45, tzinfo=mktz('UTC'))])
df.index.name = "datetime"
tickstore.write('testsym',df)
print(f"\n\nwrite some price\n{df}")

df_dup = DataFrame(   
         data  = {'price': [108.193]}
      , index  =  [duplicate_ts])
df_dup.index.name = "datetime"
tickstore.write('testsym',df_dup)
tickstore.write('testsym',df_dup)
df_read = tickstore.read('testsym')
print(f"\n\nmake duplicates, expect arctic returns error: TimeSeries data is out of order\n{df_read}")

tickstore.delete('testsym',DateRange(duplicate_ts,duplicate_ts))
df_read = tickstore.read('testsym')
print(f"\n\nexpect delete all data with the same timestamp{duplicate_ts} but remain one\n{df_read}")

rng = DateRange(duplicate_ts,dt(2020, 4, 24, 15, 30, 42, tzinfo=mktz('UTC')))
print(f"\n\npass a date range to delete function {rng}")
tickstore.delete('testsym',rng)
df_read = tickstore.read('testsym')
print(f"\n\nexpect delete two rows but nothing was touched\n{df_read}")

Output logs:

WARNING:arctic.tickstore.tickstore:NB treating all values as 'exists' - no longer sparse
WARNING:arctic.tickstore.tickstore:NB treating all values as 'exists' - no longer sparse

write some price
                             price
datetime                          
2020-04-24 15:30:39+00:00  108.193
2020-04-24 15:30:41+00:00  110.193
2020-04-24 15:30:43+00:00  111.193
2020-04-24 15:30:45+00:00  112.193
WARNING:arctic.tickstore.tickstore:NB treating all values as 'exists' - no longer sparse
ERROR:arctic.tickstore.tickstore:TimeSeries data is out of order, sorting!

make duplicates, expect arctic returns error: TimeSeries data is out of order
                             price
2020-04-24 23:30:39+08:00  108.193
2020-04-24 23:30:39+08:00  108.193
2020-04-24 23:30:39+08:00  108.193
2020-04-24 23:30:41+08:00  110.193
2020-04-24 23:30:43+08:00  111.193
2020-04-24 23:30:45+08:00  112.193

expect delete all data with the same timestamp2020-04-24 15:30:39+00:00 but remain one
                             price
2020-04-24 23:30:39+08:00  108.193
2020-04-24 23:30:41+08:00  110.193
2020-04-24 23:30:43+08:00  111.193
2020-04-24 23:30:45+08:00  112.193

pass a date range to delete function [2020-04-24 15:30:39+00:00, 2020-04-24 15:30:42+00:00]

expect delete two rows but nothing was touched
                             price
2020-04-24 23:30:39+08:00  108.193
2020-04-24 23:30:41+08:00  110.193
2020-04-24 23:30:43+08:00  111.193
2020-04-24 23:30:45+08:00  112.193