man-group / arctic

High performance datastore for time series and tick data
https://arctic.readthedocs.io/en/latest/
GNU Lesser General Public License v2.1
3.05k stars 583 forks source link

Migrating existing tickstore to ArcticDB #1010

Closed markeasec closed 1 year ago

markeasec commented 1 year ago

Arctic Version

1.80.0

Arctic Store

TickStore

Platform and version

RHEL 7

Description of problem and/or code sample that reproduces the issue

Hello, I have a collection of a few TB of tick data in an arctic tickstore that I want to migrate to the new ArcticDB.

I believe the only publicly available way to do this is to read all the data out from tickstore and write it to ArcticDB, is this correct?

If so, I was wondering if there is a recommended approach for that. The only way I could think of was to read it in time chunks, say 1 hour at a time, and then write it to arcticDB. Is there a way to instead iterate over the underlying mongodb documents, read 1 at a time, and write the resulting dataframe to arcticDB? I looked through tickstore.py and couldn't see any methods that would support that but maybe I missed something or maybe one of the existing methods could be modified to accomplish this?

My reason for preferring a documents approach vs a time chunks approach would just be to: A - have deterministic data sizes in the read/write process (no risk of running out of memory during the job) B - seems cleaner to me, I worry about ticks at the very edge of the time window getting read twice, written twice and thus duplicated. Thanks in advance for any help you can provide.

markeasec commented 1 year ago

Can anyone provide any advice on this issue?

mehertz commented 1 year ago

There isn't a good solution for this currently. We don't have an automated migration tool that is any smarter than read and write, though it is on the roadmap.

I think your concerns are correct though. Could you please raise this in the ArcticDB repo (this repository is not actively monitored whereas ArcticDB is) so we can prioritise there and so we can keep the ticket up to date if we release a tool to do this.

markeasec commented 1 year ago

ok thanks. I will do so.