man-group / arctic

High performance datastore for time series and tick data
https://arctic.readthedocs.io/en/latest/
GNU Lesser General Public License v2.1
3.06k stars 583 forks source link

Only 600K of 1M data can be import into TickStore. How to solve it? #505

Closed renweibo closed 6 years ago

renweibo commented 6 years ago

Arctic Version

1.58.0 

Arctic Store

TickStore

Platform and version

MacOS X, python 3.6

Description of problem and/or code sample that reproduces the issue

Only 600k data is imported while I want to import 1M. No exactly error found. It's wired. Is this a limit or bug? Could you provide some workaround? Thanks!

Here is a sample code

store = Arctic('localhost')
store.initialize_library('test1', lib_type=TICK_STORE)
library = store['test1']
df=pd.read_csv(<datafile>, compression="gzip") 
df.index = pd.to_datetime(df.pop('time'), unit='ms')
df.index = df.index.tz_localize('UTC').tz_convert('US/Pacific') # df.shape: (1000000, 3)
library.write(symbol, df)
library.read(symbol).shape # (600000, 3)
richardbounds commented 6 years ago

I suspect you need to pass a date_range argument to read()

jamesblackburn commented 6 years ago

Yes - that's likely. read defaults to reading 1-month of data if you don't specify a date-range as the alternative is that arctic may exhaust all memory on the machine if you're not careful.

renweibo commented 6 years ago

Great advice, it's ok now after following change. Thanks @richardbounds .

from arctic.date import DateRange 
library.read(symbol, date_range=DateRange('2017-01-01', '2018-03-01')).shape

by the way, could you provide some information to explain why this happen? I mean why 600000. I cannot get it after look into the code.

renweibo commented 6 years ago

As a backup, here is the code which take affect when date_range is None. It will be one month.

in arctic/tickstore/tickstore.py, line 224

last_dt = first_dt + timedelta(days=30)