pikers / piker

(e2e) foss trading for non-tinas
GNU Affero General Public License v3.0
103 stars 17 forks source link

Real-time data feed architecture #98

Open goodboy opened 4 years ago

goodboy commented 4 years ago

We're on the cusp of introducing real-time charting and with that the ability to easily enable forward testing and sophisticated types of back testing (such as WFO). This is possible with the right broker but there's some design decisions to be made about how data feeds are managed for IPC and minimum latency.

There are 3 main data sinks I can envision from the outset:

  1. storage (eg. marketstore, influxdb, SQL stores, techtonicdb)
  2. processing pipelines (eg. tractor actors running numpy computations, ML framework endpoints)
  3. UIs (eg. charts, watchlists, blotters)

There's a lot to research and test wrt to interchange formats (goodboy/tractor#58), IPC protocol details, and frankly a lot of the stuff apache arrow and flight are built to address. I wanted to start describing some feed architectures and conventional problems that will be of note.

Shared memory updates

One the designs we should experiment with heavily is shared mem IPC for numpy array passing since that is likely to stay (for now) our primary data structure format due to its wide adoption in the data community. In particular using an architecture where near term data is written to a buffer directly by the broker feed process such that latency can be minimized for reads from other local processes (eg. to update graphics UIs and local downstream processing pipelines asap) versus there being larger delays for processes with lesser latency constraints (eg. downstream feeds used for monitoring or shared with a human trader who's reaction time is much slower).

Some notes on all this:

goodboy commented 3 years ago

For reference here is a (now deprecated) shm support thing for redis

goodboy commented 3 years ago

The persistent feed stuff in #161 is obviously a first draft of all this working.