mafintosh / hyperdb

Distributed scalable database
MIT License
753 stars 75 forks source link

example of hyperdb replication #19

Closed fyang1024 closed 6 years ago

fyang1024 commented 6 years ago

Is it possible to give a code example to replicate 2 db instances running on 2 machines? Say, if db1 has (key1, value1) and db2 has (key2, value2), what records will db1 and db2 have after replication? thanks.

hackergrrl commented 6 years ago

Replication works by piping two hyperdb instances' replication streams together. They communicate over their respective duplex streams in order to reach the same state.

The result of replication will be that both machines now have the same (the union) set of keys. For example:

var hyperdb = require('hyperdb')

var db1 = hyperdb('./my.db', {valueEncoding: 'utf-8'})
var db2

// need to wait for db1 to be ready to tell db2 to use the same key
db1.ready(function () {
  db2 = hyperdb('./your.db', db1.key, {valueEncoding: 'utf-8'})
  db1.put('/hello', 'world', function () {
    db2.put('/hey', 'my friend', function () {
      replicate(db1, db2)
    })
  })
})

function replicate (a, b) {
  var r1 = a.replicate()
  var r2 = b.replicate()

  r1.pipe(r2).pipe(r1)

  r1.once('end', done)
  r1.once('error', done)
  r2.once('end', done)
  r2.once('error', done)
}

var pending = 2
function done (err) {
  if (err) throw err

  if (--pending === 0) {
    console.log('done replication')

    db2.get('/hello', function (err, nodes) {
      if (err) throw err
      console.log('/hello --> ' + nodes[0].value)
    })
  }
}
fyang1024 commented 6 years ago

Thank you Stephen. I assume both ./my.db and ./your.db folders have to be accessible by the 2 machines, right? So the replication is done on file system instead of over network and replication stream doesn't involve network connection. Did I get it right?

derhuerst commented 6 years ago

The replication mechanism of hyperdb is entirely independent of the transport form. You can pipe these streams into a WebRTC channel, TCP or file system channel. You can also just pipe them together in-process.

Therefore hyperdb can be used to replicate between instances a) in the same JS process b) on the same machine c) anywhere on the internet, as long as they can establish a connection.

hackergrrl commented 6 years ago

@fyang1024 I used two folders on the same JS process for convenience of illustrating the example. @derhuerst clarifies the matter excellently! :tada:

fyang1024 commented 6 years ago

Thank you Jannis. Before I tried to use TCP to pipe them together, I found that if I change

    db2.get('/hello', function (err, nodes) {
      if (err) throw err
      console.log('/hello --> ' + nodes[0].value)
    })

to

    db1.get('/hey', function (err, nodes) {
      if (err) throw err
      console.log('/hey --> ' + nodes[0].value)
    })

It doesn't work, which means db2's key value is not replicated to db1. Did I do anything wrong?

derhuerst commented 6 years ago

Did you get the entry before the replication had finished? You need to wait until the replication is done.

There are two modes of replication though. With the non-live replication it's easy: *out of my head( You should be able to listen on the finish event on the replication stream.

With live-replication, I don't know a way though. @noffle do you know?

fyang1024 commented 6 years ago

@derhuerst I kept everything the same except changing that piece just to see if db2's key value got replicated to db1. I think @noffle's code already assures the replication is done on both by the following check

if (--pending === 0) 

I tried to pipe the streams using TCP connection. I can see connect, data received, disconnect, but 'end' event is never emitted on the replication streams for some reason. The code is below.

var hyperdb = require('hyperdb')
var net = require('net')

var db1 = hyperdb('./my.db', {valueEncoding: 'utf-8'})
var db2

// need to wait for db1 to be ready to tell db2 to use the same key
db1.ready(function () {
  db2 = hyperdb('./your.db', db1.key, {valueEncoding: 'utf-8'})
  db1.put('/hello', 'world', function () {
    db2.put('/hey', 'my friend', function () {
      replicate(db1, db2)
    })
  })
})

function replicate (a, b) {
  var r1 = a.replicate()
  var r2 = b.replicate()

  // r1.pipe(r2).pipe(r1)
  var port = 2222
  net.createServer(function (socket) {
    console.log('client connected')
    socket.on('end', () => {
      console.log('client disconnected')
    })
    socket.pipe(r1).pipe(socket)
  }).listen(port)

  var clientSocket = net.connect(port)
  clientSocket.on('connect', () => {
    console.log('connected to server')
  })
  clientSocket.on('data', () => {
    console.log('data received')
  })
  clientSocket.pipe(r2).pipe(clientSocket)

  r1.once('end', done)
  r1.once('error', done)
  r2.once('end', done)
  r2.once('error', done)
}

var pending = 2
function done (err) {
  if (err) throw err

  if (--pending === 0) {
    console.log('done replication')

    db2.get('/hello', function (err, nodes) {
      if (err) throw err
      console.log('/hey --> ' + nodes[0].value)
    })
  }
}

I am quite a noob on node js. Sorry about that

hackergrrl commented 6 years ago

Interesting. If you listen for the 'finish' event instead of 'end' on the replication streams, it terminates fine. Otherwise I'm not seeing r2 emit 'end', though I'm not sure why. cc @mafintosh

fyang1024 commented 6 years ago

@noffle indeed. 'finish' is emitted with and without network piping. however, db2's key value is not replicated to db1 in either case, while only db1's key value is replicated to db2 as expected. @mafintosh do you have any idea?

fyang1024 commented 6 years ago

@noffle I've upgraded hyperdb to latest version. Now if I run the replication code

db2 = hyperdb('./your.db', db1.key, {valueEncoding: 'utf-8'})

It throws Error: Another hypercore is stored here

If I removes db1.key from constructor

db2 = hyperdb('./your.db', {valueEncoding: 'utf-8'})

It throws Error: First shared hypercore must be the same

What's the right way to do replication now?

hackergrrl commented 6 years ago

@fyang1024 I'm not able to reproduce the issue you're seeing.

  1. Are you running the code I posted as-is, in a clean directory?
  2. Did you delete my.db and your.db before starting?

The error message you're seeing (the first one) makes me think that your.db is maybe a db from an older test of yours that needs to be wiped?

fyang1024 commented 6 years ago

@noffle you are right. I deleted my.db but overlooked your.db. Yes, the code runs, but 'hey/' -> 'my friend' in db2 still not copied to db1, hmmm

hackergrrl commented 6 years ago

My mistake @fyang1024: I forgot that db1 must authorize db2 for replication to bring db2's data over. Here is an updated replication example:

var hyperdb = require('hyperdb')

var db1 = hyperdb('./my.db', {valueEncoding: 'utf-8'})
var db2

db1.ready(function () {  // need to wait for db1 to be ready to tell db2 to use the same key
  db2 = hyperdb('./your.db', db1.key, {valueEncoding: 'utf-8'})
  db2.ready(function () {  // need to wait for db2 to be ready before db2.local.key is set
    db1.put('/hello', 'world', function () {
      db1.authorize(db2.local.key, function () {  // give db2 permission to write to the shared hyperdb
        db2.put('/hey', 'my friend', function () {
          replicate(db1, db2, function (err) {
            db1.get('/hey', console.log)
          })
        })
      })
    })
  })
})

function replicate (a, b, cb) {
  var stream = a.replicate()
  stream.pipe(b.replicate()).pipe(stream).on('end', cb)
}
fyang1024 commented 6 years ago

@noffle Gotcha! thanks. I had to change the order slightly

db1.put('/hello', 'world', function () {
  db2.put('/hey', 'my friend', function () {
    db1.authorize(db2.local.key, function () {  // give db2 permission to write to the shared hyperdb
       replicate(db1, db2, function (err) {
          db1.get('/hey', console.log)
       })
     })
   })
 })

to make it work, otherwise it complains Cannot read property 'key' of null.