johnrichardrinehart / SolanaSeason

Solana Season Hackathon Project
0 stars 0 forks source link

Define Project Serum Schema #2

Open johnrichardrinehart opened 3 years ago

johnrichardrinehart commented 3 years ago
kunalkhamar commented 3 years ago

Couple ideas for the schema The easy way for initial data is one table per currency pair per interval e.g. one table for BTC/USD 1 min, columns [timestamp, bid, bid_qty, ask, ask_qty] another table for BTC/USD 1 hour, columns [timestamp, bid, bid_qty, ask, ask_qty] another table for ETH/USD 1 min, columns [timestamp, bid, bid_qty, ask, ask_qty] ...

We can definitely optimize for different things. e.g. reduce the number of tables by having one table per interval 1 min table: columns [currency_pair, timestamp, bid, bid_qty, ask, ask_qty] 5 min table: columns [currency_pair, timestamp, bid, bid_qty, ask, ask_qty] ...

Once candles schema is decided, volume and total value locked are pretty straightforward.

We might have to wait to get a data source and check how dune handles this before going any further.

johnrichardrinehart commented 3 years ago

The easy way for initial data is one table per currency pair per interval e.g. one table for BTC/USD 1 min, columns [timestamp, bid, bid_qty, ask, ask_qty] another table for BTC/USD 1 hour, columns [timestamp, bid, bid_qty, ask, ask_qty] another table for ETH/USD 1 min, columns [timestamp, bid, bid_qty, ask, ask_qty]

I think having each pair in a table is a great idea, but even better might be to have a table per data source (like Serum/Raydium/etc.) with information like

Serum

ID timestamp token_buy token_sell price_buy price_sell volume_buy volume_sell
1 01-01.... BTC ETH 1,000.00 999.99 25 10
2 01-01.... USD ETH 2,500.00 2499.99 12 5

Raydium

ID timestamp token_buy token_sell price_buy price_sell volume_buy volume_sell
1 01-01.... BTC ETH 1,000.00 999.99 25 10
2 01-01.... USD ETH 2,500.00 2499.99 12 5

This avoids the need for complex (inner? outer?) joins over different tables (which I've never even performed, myself). Also, keeping all of the times in one table should make it possible to create a VIEW or something derived from the fundamental table.

Initially, I plan on sampling every second (for at least the next week, or so. So, that's 604800 samples with maybe a few hundred interesting rows per sample. Assuming 200 rows per sample (a lot of currency pairs and a couple different exchanges) that's ~121 Mrows. Now, postgres should handle compression and logarithmic look-up times. So, I think this feasible on-disk and with the memory of the machine........... but, YOLO! We can always upgrade the instance if performance is sketchy af :) .

And, really, probably, we should be using a TSDB (like InfluxDB) but postgres comes for free with redash and I'm lazy/crunched for time.