mediachain / concat

Mediachain daemons
MIT License
42 stars 13 forks source link

Facilitate batch read/writes #34

Closed vyzo closed 7 years ago

vyzo commented 7 years ago

Batch Writes:

Batch Reads:

Query stream behavior:

Note: The database schema changes to support counters: it's an auto-increment primary key in the envelope table, which also acquires an index for the statement id (as it's no longer the primary key).

An example with the spiffy new counters and ORDER BY:

$ curl -H "Content-Type: application/text" -d 'SELECT (counter, id) FROM * ORDER BY counter LIMIT 5' http://127.0.0.1:9002/query
{"counter":1,"id":"QmWwnVop4hHB3K6z2vuUgU9rh5VrDwMnwuiP9LZew7NzUK:1475668473:0"}
{"counter":2,"id":"QmWwnVop4hHB3K6z2vuUgU9rh5VrDwMnwuiP9LZew7NzUK:1475668473:1"}
{"counter":3,"id":"QmWwnVop4hHB3K6z2vuUgU9rh5VrDwMnwuiP9LZew7NzUK:1475668473:2"}
{"counter":4,"id":"QmWwnVop4hHB3K6z2vuUgU9rh5VrDwMnwuiP9LZew7NzUK:1475668473:3"}
{"counter":5,"id":"QmWwnVop4hHB3K6z2vuUgU9rh5VrDwMnwuiP9LZew7NzUK:1475668473:4"}

$ curl -H "Content-Type: application/text" -d 'SELECT (counter, id) FROM * ORDER BY counter DESC LIMIT 5' http://127.0.0.1:9002/query
{"counter":11000,"id":"QmWwnVop4hHB3K6z2vuUgU9rh5VrDwMnwuiP9LZew7NzUK:1475668571:10999"}
{"counter":10999,"id":"QmWwnVop4hHB3K6z2vuUgU9rh5VrDwMnwuiP9LZew7NzUK:1475668571:10998"}
{"counter":10998,"id":"QmWwnVop4hHB3K6z2vuUgU9rh5VrDwMnwuiP9LZew7NzUK:1475668571:10997"}
{"counter":10997,"id":"QmWwnVop4hHB3K6z2vuUgU9rh5VrDwMnwuiP9LZew7NzUK:1475668571:10996"}
{"counter":10996,"id":"QmWwnVop4hHB3K6z2vuUgU9rh5VrDwMnwuiP9LZew7NzUK:1475668571:10995"}

$ curl -H "Content-Type: application/text" -d 'SELECT id FROM * ORDER BY counter LIMIT 5' http://127.0.0.1:9002/query
"QmWwnVop4hHB3K6z2vuUgU9rh5VrDwMnwuiP9LZew7NzUK:1475668473:0"
"QmWwnVop4hHB3K6z2vuUgU9rh5VrDwMnwuiP9LZew7NzUK:1475668473:1"
"QmWwnVop4hHB3K6z2vuUgU9rh5VrDwMnwuiP9LZew7NzUK:1475668473:2"
"QmWwnVop4hHB3K6z2vuUgU9rh5VrDwMnwuiP9LZew7NzUK:1475668473:3"
"QmWwnVop4hHB3K6z2vuUgU9rh5VrDwMnwuiP9LZew7NzUK:1475668473:4"

$ curl -H "Content-Type: application/text" -d 'SELECT id FROM * ORDER BY counter DESC LIMIT 5' http://127.0.0.1:9002/query
"QmWwnVop4hHB3K6z2vuUgU9rh5VrDwMnwuiP9LZew7NzUK:1475668571:10999"
"QmWwnVop4hHB3K6z2vuUgU9rh5VrDwMnwuiP9LZew7NzUK:1475668571:10998"
"QmWwnVop4hHB3K6z2vuUgU9rh5VrDwMnwuiP9LZew7NzUK:1475668571:10997"
"QmWwnVop4hHB3K6z2vuUgU9rh5VrDwMnwuiP9LZew7NzUK:1475668571:10996"
"QmWwnVop4hHB3K6z2vuUgU9rh5VrDwMnwuiP9LZew7NzUK:1475668571:10995"
yusefnapora commented 7 years ago

👍 nice work on this, sorry I didn't get to the review yesterday. everything looks good to me

parkan commented 7 years ago

Why does

'SELECT id FROM * ORDER BY counter LIMIT 5'

yield "QmWwnVop4hHB3K6z2vuUgU9rh5VrDwMnwuiP9LZew7NzUK:1475668473:0"? i.e. why are fields other than id in there

parkan commented 7 years ago

Other than that, implementation looks good.

At a higher level, these ops are all fine from a relational algebra perspective, but I wonder how well aligned the semantics are with actual queries we will need. For example, I envision very little ORDER BY usage (outside of this specific paginating * case). We should put together a set of sample queries.

vyzo commented 7 years ago

These are the actual ids! They are constructed as the concatenation of publisher id, timestamp, and a temp counter to distinguish statements published at the same second.

vyzo commented 7 years ago

Well, ORDER BY is intended to facilitate sync and polling, and yes pagination -- it's an important feature.

It is also helpful for casual exploration of the data set. For instance, to see the last 10 statements merged in a namespace:

SELECT * FROM some.namespace ORDER BY counter DESC LIMIT 10

And if you want the last 10 by publication timestamp:

SELECT * FROM some.namespace ORDER BY timestamp DESC LIMIT 10

I'll put together some sample queries when it's README time.