man-group / arctic

High performance datastore for time series and tick data
https://arctic.readthedocs.io/en/latest/
GNU Lesser General Public License v2.1
3.05k stars 583 forks source link

Feature Req: Support for higher resolution timestamps #226

Open lJoublanc opened 8 years ago

lJoublanc commented 8 years ago

Is there any plan to support higher-res timestamps than the current 1 millisecond? For example, doesn't mifid 2 have a requirement for higher-res timestamps, for transactional/business data?

I see a couple of issues worth pointing out:

jengelman commented 6 years ago

@bmoscon @lJoublanc Did anyone take a look at/start working on this? I would love to replace some of our local storage system with Arctic, but the millisecond rounding is a fairly big issue. I was thinking of just switching the timestamp indices to nanos from epoch and updating the serializer/deserializers accordingly, but if someone else has a better idea or WIP, please let me know!

Edit: Looks like the current version of Arctic handles microseconds just fine, looking forward to using it!

lJoublanc commented 6 years ago

No, I haven't had a chance. I will need to re-write a scala adapter soon (next few months), so I may take a look then. I think the way to go is to use Datetime64 which is supposed to be variable resolution (but I can't recall whether the resolution is part of the data-type, or whether you need to store it as a value separately). I remember when I raised this request first, I was looking at this and being puzzled as to why the index was made up of deltas instead of absolute values. There must be a good reason behind it.

lJoublanc commented 6 years ago

So Datetime64 doesn't explicitly encode the res. A pretty detailed description is available here. wrt to the deltas in the index, I suspect this is because they're timestamping the ticks themselves (rather than using the exchange-provided timestamp), using HPET, rather than the system clock. This isn't guaranteed to provide absolute time (rather relative time - and I don't even think it's 'time', but rather CPU cycles, which are then divided by CPU freq to work out nanos). So perhaps using timedelta dtype and store the res as an extra bson field (defaulting to ms if it's missing, to preserve backward compat) would work nicely.

richardbounds commented 6 years ago

There's no particular magic in the deltas - I think I tried a few things and the deltas compressed better than storing full timestamps. We don't do anything special with generating the timestamps - it is just System.currentTimeMillis() in the recieving thread (the writing of live tick data is all done in Java). We deliberately don't make any guarantees about clock synchronization across tick streams for different tickers, so the timestamp is just a handy label for locating the interesting section of the event stream.

jengelman commented 6 years ago

We also store all of our indexes as epoch deltas, so was planning to stick to that anyways. For internal use, I was planning on just altering the conversions in ms_to_datetime and datetime_to_ms, but that would break any existing installs using mili timestamps, so I wanted something more general before submitting a PR. I like the idea of a precision value (and associated switch logic in conversion functions) for the timestamps, but seems expensive (storage wise) to store it for each timestamp. Presumably, you'd be using the same precision for every tick or snapshot or whatever inserted for a given symbol, so how about just adding that to the metadata for the storage engine and then passing that into the conversion functions?

lJoublanc commented 6 years ago

Presumably, you'd be using the same precision for every tick or snapshot or whatever inserted for a given symbol, so how about just adding that to the metadata for the storage engine and then passing that into the conversion functions?

Yes, sorry that wasn't clear - that's what I meant when I said

So perhaps using timedelta dtype and store the res as an extra bson field (defaulting to ms if it's missing, to preserve backward compat) would work nicely.