man-group / ArcticDB

ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.
http://arcticdb.io
Other
1.23k stars 79 forks source link

Fix ArcticDB reading streaming data #1647

Closed poodlewars closed 1 week ago

poodlewars commented 1 week ago

Reference Issues/PRs

Fixes https://github.com/man-group/arcticdb-man/issues/80

What does this implement or fix?

Allows ArcticDB to read incomplete segments written by arcticc tick collectors.

Any other comments?

arcticc has a different append incompletes logic to ArcticDB. It writes a dummy StreamDescriptor on the TimeseriesDescriptor, and relies entirely on the StreamDescriptor in the segment header. That approach is better than the existing ArcticDB approach, which writes the same StreamDescriptor twice (directly in the header, and in the TimeseriesDescriptor).

ArcticDB was reading this dummy stream descriptor and crashing.

Firstly, this PR changes readers to first check the StreamDescriptor on the header of incompletes, rather than using the one stamped on the TimeseriesDescriptor. It also adds a backwards compat test for the format.

Secondly, and this is not required for the PR to be logically correct, we change writers so that, like arcticc, they do not duplicate the StreamDescriptor. This is done in 99d0870307.

Checklist

Checklist for code changes... - [x] Have you updated the relevant docstrings, documentation and copyright notice? - [x] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [x] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [x] Are API changes highlighted in the PR description? - [x] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?