oceanprotocol / pdr-backend

Instructions & code to run predictoors, traders, more.
Apache License 2.0
26 stars 22 forks source link

[Model/trade] Experiment on volume bars, dollar bars, DIBs, etc #1244

Open trentmc opened 2 months ago

trentmc commented 2 months ago

Background / motivation

5-min candles ("time bars") don't have that much info. And 5min (or any time tick) is quite constraining.

We can construct more informative bars from raw trade data. (Where raw trade data = each atomic trade on its own.)

Let's play with it to see how well we can predict or trade against predictions. This can be fully separate from simulation to start with.

TODOs

Resources: Blogs & Code for Info Ticks Etc

Resources: Maks Ivanov

  1. Blog PDF "Financial Machine Learning Part 0: Bars", Feb 27, 2019
    • 🔥🔥Has Py code for Time Bars, Tick Bars, Volume Bars, Dollar Bars, Dollar Imbalance Bars. In a nice way that builds from one to the next

Resources: Ved Prakash

  1. Blog PDF "Major reasons why ML fails in stock prediction : Part 2", Feb 17, 2024. It heavily references the book "Adv. in Financial ML" by Marcos López de Prado, and related video (see above).
    • 🔥 has Py code for Tick Imbalance Bars (TIBs) and Tick Run Bars (TRBs)
    • Related video Marcos Lopez de Prado, "The 7 Reasons Most ML Funds Fail", from QuantCon 2018
TIBs TRBs

Resources: Gerard Martinez

  1. Blog PDF "Financial ML practitioners have been using the wrong candlesticks: here’s why", apr 21, 2019

  2. Blog PDF "Advanced candlesticks for machine learning (i): tick bars", apr 24, 2019

    • has Py code to generate tick bars (tick candlesticks) (versus standard time-based candlesticks)
  3. Blog PDF "Advanced candlesticks for ML (ii): volume and dollar bars", Gerard Martinez, May 2, 2019.

    • 🔥 Has Py code for Volume Bars and Dollar Bars. Image below.
  4. Blog PDF "Information-driven bars for financial machine learning: imbalance bars", May 20, 2019

    • Has some nice explanations and plots for imbalance bars. Alas, no code.

Resource: Proskurin Oleksandr

  1. 🔥 Py code for time, tick, volume, and dollar bars data_structures.py PDF py. Very clean.

    • It's a fork of mlfinlab repo, from hudson-and-thames, but 79 commits ahead. Last changed in 2019.
  2. 🔥 Py code for dollar imbalance bars (DIBs) github cltai9145 PDF

Resource: Experiments from Prado Book

  1. https://github.com/BlackArbsCEO/Adv_Fin_ML_Exercises. "Experimental solutions to selected exercises from the [Prado] book". Well-organized. Has links to other projects inspired by the book.
  2. https://github.com/cltai9145/research/tree/master. "Contains all the Jupyter Notebooks used in [Hudson & Thames] research". Organized based on chapters in Prado book

Resources: Kaiko data

Kaiko products: cryptocurrency maket data Link "Historical data is available via API, CSV Files and BigQuery; and live data via API, Stream, and private connectivity channels."

Kaiko docs. Link

Kaiko github org. Link

3rd party Kaiko py driver

idiom-bytes commented 1 month ago

@trentmc @trizin @AmandaZYY i'm just posting this here based on the standup and comments wrt: "volume bars may not be timeseries compatible"

ASK: I believe it would be constructive if all "bars" are still modeled in a way where they are timeseries-compatible.

I.E. Consider the trades that happened on Jan-01-01-00:00 -> Jan-01-01-23:59

[Price Bars - In a timeseries of 5m timeframe] Jan-01-01-00:00 -> Jan-01-01-00:05 OHLCV1 Jan-01-01-00:05 -> Jan-01-01-00:10 OHLCV2 Jan-01-01-00:15 -> Jan-01-01-00:15 OHLCV3

[Volume Bars Proposal 1 - In a timeseries of 5m timeframe] Jan-01-01-00:00 -> Jan-01-01-00:12 OHLCV1 Jan-01-01-00:13 -> Jan-01-01-00:15 OHLCV2

volume bars data could perhaps also include st_ts and end_ts, such that we can explode the data into a different time-structure such as 1m candles.

[Volume Bars for training Proposal 1 - When asked to explode intervals from Volume Bars] Jan-01-01-00:08 -> Jan-01-01-00:09 OHLCV1 Jan-01-01-00:09 -> Jan-01-01-00:10 OHLCV1 Jan-01-01-00:10 -> Jan-01-01-00:11 OHLCV1 Jan-01-01-00:11 -> Jan-01-01-00:12 OHLCV1 Jan-01-01-00:12 -> Jan-01-01-00:13 OHLCV1 Jan-01-01-00:13 -> Jan-01-01-00:14 OHLCV2 Jan-01-01-00:14 -> Jan-01-01-00:15 OHLCV2

Training would then be completed w/ 1m data by blowing up 5m or 1h. Training data blob become far more sizeable in this scenario, but volume + price bars can be used interchangeably.

AmandaZYY commented 1 month ago

@idiom-bytes Hi, I think it can be timeseries-compatible as in we can add a timestamp column to the candles, but it won't be evenly spaced as you suggested in a timeseries of 5m timeframe, I.E. Consider the trades that happened on Jan-01-01-00:00 -> Jan-01-01-23:59

[Volume Bars (10000)] It might be Jan-01-01-00:00 -> Jan-01-01-00:03 OHLCV1 Jan-01-01-00:03 -> Jan-01-01-00:10 OHLCV2 ( due to less trading volume of this time period) Jan-01-01-00:10 -> Jan-01-01-00:11 OHLCV3 ( due to more trading volume of this time period) the timestamp is decided by either the beginning or the end of volume bar. would it integrate well with the current training pipeline?

idiom-bytes commented 1 month ago

I believe we would need to explode the data into a compatible timeseries that is "evenly spaced" (i.e. minute-by-minute) in order for it to integrate well with the rest of the training pipeline.

Like Trent said, perhaps we should (1) just get these things working, and then (2) look at how to make it compatible with the current training pipeline.

I'm just sharing some thoughts while considering how volume bars are structured, such such that we may use them in the future with the rest of our data.

trizin commented 1 month ago

Thank you for your comments @idiom-bytes we don't need to make it compatible with the current training pipeline. The objective is to understand: how does each bar do wrt "trader $ made"

idiom-bytes commented 1 month ago
how does each bar do wrt "trader $ made"

i'm not sure what you mean here but trent's blog posts match up to my mental models best of luck working through it and am looking forward to the results