Closed markeasec closed 1 year ago
Thanks for raising @markeasec. As mentioned in the Arctic repository, we don't currently have a good way to do this that we can offer but your thoughts as to why we should are very reasonable.
I've prioritised this - we'll update this ticket as to when we make progress but I can't offer a timeline right now so I wouldn't advise waiting for this functionality to be made available if you can avoid it.
Thanks. Can you clarify if there is any danger of data in Arctic (not arcticDB) being read twice due to being 'at' the start/end of a window? Or is the left-hand side of a window always inclusive and the right-hand side always exclusive? If there's no danger of duplicating data with a read/write approach, i will probably just bite the bullet and spin up a huge box and do it that way.
You will have to ask that on the https://github.com/man-group/arctic repo. Different teams maintain these two code bases.
AFAIK, you can use this DateRange type to specify whether each end is open/close.
Thanks for the pointer about DateRange, I will look into that. I had actually originally opened it there and was instructed to raise it here instead.
@markeasec Hey mate what about your TickStore migration? was fine? thinking about migration also.. check it https://github.com/man-group/arctic/issues/1026
I'm raising this issue here at the suggestion of @mehertz since the arctic repo is not actively monitored / maintained.
Arctic Version
Arctic Store
Platform and version
RHEL 7
Description of problem and/or code sample that reproduces the issue
Hello, I have a collection of a few TB of tick data in an arctic tickstore that I want to migrate to the new ArcticDB.
I believe the only publicly available way to do this is to read all the data out from tickstore and write it to ArcticDB, is this correct?
If so, I was wondering if there is a recommended approach for that. The only way I could think of was to read it in time chunks, say 1 hour at a time, and then write it to arcticDB. Is there a way to instead iterate over the underlying mongodb documents, read 1 at a time, and write the resulting dataframe to arcticDB? I looked through tickstore.py and couldn't see any methods that would support that but maybe I missed something or maybe one of the existing methods could be modified to accomplish this?
My reason for preferring a documents approach vs a time chunks approach would just be to: A - have deterministic data sizes in the read/write process (no risk of running out of memory during the job) B - seems cleaner to me, I worry about ticks at the very edge of the time window getting read twice, written twice and thus duplicated. Thanks in advance for any help you can provide.