timeplus-io / proton

A stream processing engine and database, and a fast and lightweight alternative to ksqlDB and Apache Flink, 🚀 powered by ClickHouse
https://timeplus.com
Apache License 2.0
1.58k stars 70 forks source link

Enhance replay functions #467

Open gangtao opened 11 months ago

gangtao commented 11 months ago

Use case In FS server, when doing some research or exploration, user would like to replay the histroical data/stream and using the replayed result to validate or run backtest.

my case is

  1. import nyse data of quote and trade into two streams ( two big csv files) refer to https://ftp.nyse.com/Historical%20Data%20Samples/DAILY%20TAQ/
  2. replay both streams, so the quote and trade will replay in the same time interval
  3. run some query that corelate these two streams to monitor or analysis the trade/quote

the existing query cannot be used in such case as the current replay is based on append time, and in my case, all historical data are appended in the same time.

Describe the solution you'd like

  1. support using _tp_time as the time col when replay
select * from s settings replay_speed = 1, time_col = '_tp_time'
  1. support using customer time col when replay
select * from s settings replay_speed = 1, time_col = 'time'

3 support replay historical query in streaming way

select * from table(s) order by _tp_time limit 10 settings replay_speed = 1, time_col = '_tp_time'
gangtao commented 11 months ago

3 might be hard for now, it is OK if we only support 1/2

chenziliang commented 11 months ago

Replay only supports data in streaming store. Otherwise we will need sort the whole data set according to timestamp specified and then replay.