Closed markeasec closed 1 year ago
Can anyone provide any advice on this issue?
There isn't a good solution for this currently. We don't have an automated migration tool that is any smarter than read
and write
, though it is on the roadmap.
I think your concerns are correct though. Could you please raise this in the ArcticDB repo (this repository is not actively monitored whereas ArcticDB is) so we can prioritise there and so we can keep the ticket up to date if we release a tool to do this.
ok thanks. I will do so.
Arctic Version
Arctic Store
Platform and version
RHEL 7
Description of problem and/or code sample that reproduces the issue
Hello, I have a collection of a few TB of tick data in an arctic tickstore that I want to migrate to the new ArcticDB.
I believe the only publicly available way to do this is to read all the data out from tickstore and write it to ArcticDB, is this correct?
If so, I was wondering if there is a recommended approach for that. The only way I could think of was to read it in time chunks, say 1 hour at a time, and then write it to arcticDB. Is there a way to instead iterate over the underlying mongodb documents, read 1 at a time, and write the resulting dataframe to arcticDB? I looked through tickstore.py and couldn't see any methods that would support that but maybe I missed something or maybe one of the existing methods could be modified to accomplish this?
My reason for preferring a documents approach vs a time chunks approach would just be to: A - have deterministic data sizes in the read/write process (no risk of running out of memory during the job) B - seems cleaner to me, I worry about ticks at the very edge of the time window getting read twice, written twice and thus duplicated. Thanks in advance for any help you can provide.