Open johnrichardrinehart opened 3 years ago
Couple ideas for the schema The easy way for initial data is one table per currency pair per interval e.g. one table for BTC/USD 1 min, columns [timestamp, bid, bid_qty, ask, ask_qty] another table for BTC/USD 1 hour, columns [timestamp, bid, bid_qty, ask, ask_qty] another table for ETH/USD 1 min, columns [timestamp, bid, bid_qty, ask, ask_qty] ...
We can definitely optimize for different things. e.g. reduce the number of tables by having one table per interval 1 min table: columns [currency_pair, timestamp, bid, bid_qty, ask, ask_qty] 5 min table: columns [currency_pair, timestamp, bid, bid_qty, ask, ask_qty] ...
Once candles schema is decided, volume and total value locked are pretty straightforward.
We might have to wait to get a data source and check how dune handles this before going any further.
The easy way for initial data is one table per currency pair per interval e.g. one table for BTC/USD 1 min, columns [timestamp, bid, bid_qty, ask, ask_qty] another table for BTC/USD 1 hour, columns [timestamp, bid, bid_qty, ask, ask_qty] another table for ETH/USD 1 min, columns [timestamp, bid, bid_qty, ask, ask_qty]
I think having each pair in a table is a great idea, but even better might be to have a table per data source (like Serum
/Raydium
/etc.) with information like
ID | timestamp | token_buy | token_sell | price_buy | price_sell | volume_buy | volume_sell |
---|---|---|---|---|---|---|---|
1 | 01-01.... | BTC | ETH | 1,000.00 | 999.99 | 25 | 10 |
2 | 01-01.... | USD | ETH | 2,500.00 | 2499.99 | 12 | 5 |
ID | timestamp | token_buy | token_sell | price_buy | price_sell | volume_buy | volume_sell |
---|---|---|---|---|---|---|---|
1 | 01-01.... | BTC | ETH | 1,000.00 | 999.99 | 25 | 10 |
2 | 01-01.... | USD | ETH | 2,500.00 | 2499.99 | 12 | 5 |
This avoids the need for complex (inner? outer?) joins over different tables (which I've never even performed, myself).
Also, keeping all of the times in one table should make it possible to create a VIEW
or something derived from the fundamental table.
Initially, I plan on sampling every second (for at least the next week, or so. So, that's 604800 samples with maybe a few hundred interesting rows per sample. Assuming 200 rows per sample (a lot of currency pairs and a couple different exchanges) that's ~121 Mrows. Now, postgres
should handle compression and logarithmic look-up times. So, I think this feasible on-disk and with the memory of the machine........... but, YOLO! We can always upgrade the instance if performance is sketchy af :) .
And, really, probably, we should be using a TSDB (like InfluxDB
) but postgres
comes for free with redash
and I'm lazy/crunched for time.