johnrichardrinehart commented 3 years ago

[ ] Identify necessary data
[ ] Propose an initial schema
[ ] Add the schema to the repo (managed migrations)

kunalkhamar commented 3 years ago

Couple ideas for the schema The easy way for initial data is one table per currency pair per interval e.g. one table for BTC/USD 1 min, columns [timestamp, bid, bid_qty, ask, ask_qty] another table for BTC/USD 1 hour, columns [timestamp, bid, bid_qty, ask, ask_qty] another table for ETH/USD 1 min, columns [timestamp, bid, bid_qty, ask, ask_qty] ...

We can definitely optimize for different things. e.g. reduce the number of tables by having one table per interval 1 min table: columns [currency_pair, timestamp, bid, bid_qty, ask, ask_qty] 5 min table: columns [currency_pair, timestamp, bid, bid_qty, ask, ask_qty] ...

Once candles schema is decided, volume and total value locked are pretty straightforward.

We might have to wait to get a data source and check how dune handles this before going any further.

johnrichardrinehart commented 3 years ago

The easy way for initial data is one table per currency pair per interval e.g. one table for BTC/USD 1 min, columns [timestamp, bid, bid_qty, ask, ask_qty] another table for BTC/USD 1 hour, columns [timestamp, bid, bid_qty, ask, ask_qty] another table for ETH/USD 1 min, columns [timestamp, bid, bid_qty, ask, ask_qty]

I think having each pair in a table is a great idea, but even better might be to have a table per data source (like Serum/Raydium/etc.) with information like

Serum

ID	timestamp	token_buy	token_sell	price_buy	price_sell	volume_buy	volume_sell
1	01-01....	BTC	ETH	1,000.00	999.99	25	10
2	01-01....	USD	ETH	2,500.00	2499.99	12	5

Raydium

ID	timestamp	token_buy	token_sell	price_buy	price_sell	volume_buy	volume_sell
1	01-01....	BTC	ETH	1,000.00	999.99	25	10
2	01-01....	USD	ETH	2,500.00	2499.99	12	5

This avoids the need for complex (inner? outer?) joins over different tables (which I've never even performed, myself). Also, keeping all of the times in one table should make it possible to create a VIEW or something derived from the fundamental table.

Initially, I plan on sampling every second (for at least the next week, or so. So, that's 604800 samples with maybe a few hundred interesting rows per sample. Assuming 200 rows per sample (a lot of currency pairs and a couple different exchanges) that's ~121 Mrows. Now, postgres should handle compression and logarithmic look-up times. So, I think this feasible on-disk and with the memory of the machine........... but, YOLO! We can always upgrade the instance if performance is sketchy af :) .

And, really, probably, we should be using a TSDB (like InfluxDB) but postgres comes for free with redash and I'm lazy/crunched for time.

johnrichardrinehart / SolanaSeason

Define Project Serum Schema #2

Serum

Raydium