mafintosh / hyperdb

Distributed scalable database
MIT License
752 stars 75 forks source link

Consistency: createReadStream() differences between hyperlog and hyperdb #160

Closed aral closed 5 years ago

aral commented 5 years ago

In hypercore, you can create a live stream via createReadStream(), in hyperdb, you can’t. The read stream returned in hyperdb is a stream version of an iterator that returns only the items added until the time the stream is created. This was a gotcha for me while adapting a single hypercore example to work in hyperdb.

I feel that having consistency here will help people who have knowledge at one layer to move to the other within the Dat ecosystem. Thoughts?

Example

const hypercore = require('hypercore')
const hyperdb = require('hyperdb')
const ram = require('random-access-memory')

const core = hypercore((filename) => ram(), {valueEncoding: 'json'})

hypercoreReadStream = core.createReadStream({live: true})
hypercoreReadStream.on('data', data => {
  console.log('Hypercore data: ', data)
})

const db = hyperdb((filename) => ram(), {valueEncoding: 'json'})

db.watch('/', () => {
  console.log('Hyperdb data changed!')
})

const readStream = db.createReadStream('/', {live: true})
readStream.on('data', (data) => {
  console.log('Hyperdb data:')
  data.forEach(datum => {
    console.log(`${datum.key} = ${datum.value} (feed: ${datum.feed}, seq: ${datum.seq})`)
  })
})

let c = 0
setInterval(() => {
  const timestamp = (new Date())
  const obj = {}
  obj[c] = timestamp
  core.append(obj)
  db.put(c.toString(), timestamp)
  c++;
}, 1000)

Output:

Hyperdb data changed!
Hypercore data:  { '1': '2019-01-24T18:01:16.212Z' }
Hyperdb data changed!
Hypercore data:  { '2': '2019-01-24T18:01:17.219Z' }
Hyperdb data changed!
Hypercore data:  { '3': '2019-01-24T18:01:18.223Z' }
…

The watch handler is being called in hyperdb but the read stream’s data handler is not. For the hypercore, the data handler gets called.

aral commented 5 years ago

To see the intended current behaviour in hyperdb, wrap the read stream creation in a time out (e.g., 10 seconds):

setTimeout(() => {
const readStream = db.createReadStream('/', {live: true})
readStream.on('data', (data) => {
  console.log('Hyperdb data:')
  data.forEach(datum => {
    console.log(`${datum.key} = ${datum.value} (feed: ${datum.feed}, seq: ${datum.seq})`)
  })
})
}, 10000)

The output then is:

Hyperdb data:
4 = 2019-01-24T18:06:56.931Z (feed: 0, seq: 5)
Hyperdb data:
3 = 2019-01-24T18:06:55.928Z (feed: 0, seq: 4)
Hyperdb data:
7 = 2019-01-24T18:06:59.940Z (feed: 0, seq: 8)
Hyperdb data:
6 = 2019-01-24T18:06:58.937Z (feed: 0, seq: 7)
Hyperdb data:
8 = 2019-01-24T18:07:00.943Z (feed: 0, seq: 9)
Hyperdb data:
1 = 2019-01-24T18:06:53.917Z (feed: 0, seq: 2)
Hyperdb data:
0 = 2019-01-24T18:06:52.889Z (feed: 0, seq: 1)
Hyperdb data:
5 = 2019-01-24T18:06:57.934Z (feed: 0, seq: 6)
Hyperdb data:
2 = 2019-01-24T18:06:54.925Z (feed: 0, seq: 3)

I’m not sure if the lack of order is intended behaviour. If not, I can open a separate issue for that. (I would expect the entries in either the order they were added or lexicographical order.)

mafintosh commented 5 years ago

Hi @aral - It's actually ordered by the key hash because that's how the internal trie works. That's because the trie is actually quite powerful compared to hypercore's read streams. It allows very efficient random access seeks based on a prefix key without replicating the entire dataset. We def need to document this stuff better, but this is intended behaivor. You can use a history stream to get the data out in insertion order.

aral commented 5 years ago

Cool, thanks. Closing and will have a think about documentation a little later when the train slows down a bit.