Open goodboy opened 1 year ago
We can convert this to a draft if necessary if/when #483 lands
I'm in favor of doing or own solution and I would rather stop maintaining any marketstore related coded, in the end we were almost gonna spend as much work mantaining marketstore that just doing our own thing right.
I'm in favor of doing or own solution and I would rather stop maintaining any marketstore related coded, in the end we were almost gonna spend as much work mantaining marketstore that just doing our own thing right.
yup totally agree!
ok then i'll be putting up some finishing functionality touches, hopefully tests, and then dropping all that junk 🏄🏼
To give an idea of what the parquet subdir looks like now, much in similarity to how marketstore
laid out it's own internal per table binary format files except using less space and actually being a file type data people can use 😂
Launch pad for work towards the task list in #485 🏄🏼
As a start this introduces a new
piker.storage
subsystem to provide for database related middleware(s) as well as new storage backend usingpolars
and apache parquet files to implement a built-in, local-filesystem managed "time series database":nativedb
.After some extensive tinkering and brief performance measures I'm tempted to go all in on this home grown solution for a variety of reasons (see details in 27932e44) but re-summarizing some of them here:
polars
already has a multi-db compat layer with multi-engine support we can leverage and completely sidestep integration work with multiple standard tsdbs?Core dev discusssion
[ ] we've put some work into
marketstore
support machinery including:anyio-marketstore
an async client written and maintained by our devs.[ ] we can definitely accomplish ingest, pub-sub and replication on our own (without really much effort) with the following existing subsystems and frameworks:
tractor
actor which writes to apache arrow (IPC) files and flushes to parquet on size constraints.tractor
actor andtrio-websocket
borg
(with it's unofficial API client) to accomplish file syncing across many user-hosts.borg
has a community API: https://github.com/spslater/borgapi[ ] should we drop all the existing
marketstore
code?.data.history
layer.arcticdb
is a better solution longer run then mkts was anyway given it's large insti usage..?ToDo:
[x] CHERRY from #519:
[ ] CHERRY from #528
[ ] outstanding obvious regression due to this patch set :joy:
.data.history.start_backfill()
[ ] drop market store code in general depending on outcome of above discussion.
.storage.marketstore
andanyio-marketstore
dep?.service._ahab
layer?.data.history
!from https://github.com/pikers/piker/issues/485
.storage
with subpkgs for backends and an API / mgmt layeroutstanding tsdb bugs:
436
323
docs on new filesystem layout and config options:
nativedb/
dir[storage]
section toconf.toml
:from #312 we need chart-UI integration for a buncha stuff:
reload history
for a highlighted section or gap B)[ ]
.storage.cli
refinement:--tsdb
is no longer needed since we don't need to offer optional docker activation, since we don't need it usingnativedb
backend!piker store
cmdsanal
subcmd do gap detection and discrepancy reporting (at the least) against market-venue known operating hours.[ ] new
natived
backend implemented withpolars
+ apache parquet files B)[x] since we're already moving to use
typer
in #489, let's also add confirmation support for the newpikerd storage -d
flag:added and used in the new
.storage.cli
![ ] do confirms for deletes? https://typer.tiangolo.com/tutorial/prompt/#confirm
[ ] gap backfilling (as detailed in https://github.com/pikers/piker/pull/486/commits/f45b76ed77eafdf44871d3e3305f7dc18e9de938) still requires some work for full functionality including:
[ ] UI needs a cross-actor event in the history chart's update loop to ensure we do a forced graphics data formatter update when gap-backfilling is complete.
[x] rt ingest and fast parquest update deferred to #536
[ ] currently we aren't storing rt data (received during data session but not previously written to storage) on teardown..
consider writing the arrow IPC files and then flushing to dfs and then parquet at some frequency / teardown?
[ ] related to above, what about for FSP ingest and storage?
[ ] https://github.com/pikers/piker/issues/314 probably should be re-created but for
nativedb
and a new writeup around arrow IPC and feather formats?[ ] (likely as follow up) use the lazy
polar
s API to do larger then mem processing both for charting and remote (host) processing:from the guide:
from API docs:
[ ] use![screenshot-2023-06-14_15-20-53](https://github.com/pikers/piker/assets/291685/0f43d9c0-888f-47f9-af8d-eaf6016eaf0f)
polars
to do price series anomaly repairs, such as is causes by stock splits or for handling bugs in data providers where a ticker name was repurposed for a new asset and the price history has mega gap:[ ] deciding on file organization, naming schema, subdirs for piker subsystems, etc.
[ ] should we store multiple files segmented by some time period and then simply use the multiple files reader support: https://pola-rs.github.io/polars-book/user-guide/io/multiple/
[ ] current file naming scheme is
mnq.cme.20230616.ib.ohlcv1s.parquet
but we can probably change the meta-data token partohlcv1s
to be more parse-able and readable?.
in:ohlcv.1s.<otherinfo>
?.config/piker/nativedb/fsp/
subdir?[ ] what is writing deltas and can we use it?