iriscouch / follow

Very stable, very reliable, NodeJS CouchDB _changes follower
Apache License 2.0
393 stars 82 forks source link

RFE: "seq_file" param to store change.seq and initial "since" value #42

Open isaacs opened 10 years ago

isaacs commented 10 years ago

I find myself doing this every time I use follow:

var seq = path.resolve(__dirname, 'sequence')
var since = readSeq(seq)

follow({
  db: myCouch,
  since: since
}, function(er, change) {
  if (er)
    throw er
  saveSeq(file, change.seq)
  // do stuff...
})

function readSeq(file) {
  try {
    return +fs.readFileSync(file, 'ascii') || 0
  } catch (er) {
    return 0
  }
}

var saving = {}
function saveSeq(file, seq) {
  if (saving[file])
    return
  saving[file] = true
  fs.writeFile(file, '' + seq + '\n', 'ascii', function(er) {
    saving[file] = false
  })
}

It's not that important to make sure that every sequence ID is saved, and of course, a lot in rapid succession will NOT be saved. But, I write follow scripts with the intent of them being crash-only and picking up where they leave off on a crash. Couch is great for this, and it'd be awesome if follow made it easier.

Ideal API:

follow({
  db: myCouch,
  seq_file: path.resolve(__dirname, 'sequence')
}, function(er, change) {
  // etc.
})
isaacs commented 10 years ago

Also: I'd be happy to send a patch of course, but I wasn't sure at a cursory glance where to put it in the code. Any pointer would be great!

jhs commented 10 years ago

@isaacs How would you feel if Follow stored its checkpoints in the remote database, in a non-replicating local document?

In this case you would provide some sort of "follow ID" (maybe by default it is os.hostname). Follow already does a bit of pre-follow sanity checking so I think the _local query would have little if any latency cost.

{
  db: myCouch,
  client_id: "i am still awesome"
}
isaacs commented 10 years ago

That'd also be a nice feature, but the cool thing about a sequence file is that I can easily set it at a certain point, or scp it to a new server, etc.

If follow always defaulted the client_id to a specific field, then it gets a little more confusing. I have a bunch of followers of the npm registry, for example, all on the same hostname, doing different things. In some cases, I might want to copy the sequence file from one to another, etc. Files are a little bit easier to reason about, and don't impose a remote semantics issue.

Otoh, for some cases, it might definitely make sense to have a remote sequence ID. In that case, it'd be best to NOT default to anything, though. Just make it an option, like you could do either seq_file: 'foo.seq' or seq_doc: 'i am still awesome', and throw if you specify both, since that's just weird?

jhs commented 10 years ago

Yeah you've persuaded to KISS. Remote sequence ID can come later; or maybe there can be a generic callback-metacallback API later. The fact that it is not totally obvious strongly indicates KISS.

I will have to glance at the code again but this belongs somewhere in the init or prep phase, where Follow hits /db to see what its last_seq is, and generally gets its bearings before the real _changes query.

For how to store it in the file, I'm not sure. Either a Feed object could subscribe to its own "change" event, or that could be handled by the user at a cost of breaking parity.

Incidentally, is Follow the reason your program is crashing? It should not really ever crash, even if the target server goes down or even changes its IP address. Is there another bug that Follow has?

On Fri, Dec 27, 2013 at 1:57 AM, Isaac Z. Schlueter < notifications@github.com> wrote:

That'd also be a nice feature, but the cool thing about a sequence file is that I can easily set it at a certain point, or scp it to a new server, etc.

If follow always defaulted the client_id to a specific field, then it gets a little more confusing. I have a bunch of followers of the npm registry, for example, all on the same hostname, doing different things. In some cases, I might want to copy the sequence file from one to another, etc. Files are a little bit easier to reason about, and don't impose a remote semantics issue.

Otoh, for some cases, it might definitely make sense to have a remote sequence ID. In that case, it'd be best to NOT default to anything, though. Just make it an option, like you could do either seq_file: 'foo.seq' or seq_doc: 'i am still awesome', and throw if you specify both, since that's just weird?

— Reply to this email directly or view it on GitHubhttps://github.com/iriscouch/follow/issues/42#issuecomment-31231346 .

jcrugzz commented 10 years ago

@isaacs I plan to allow a seqfile instance or filepath of sorts to be used in the refactor for v1.0.0

isaacs commented 10 years ago

@jcrugzz Kewl! You may want to check out http://npm.im/seq-file, if it's useful for you. It makes sure that the saves are atomic, which prevents spurious restarts at 0 when crashing mid-write.

jcrugzz commented 10 years ago

@isaacs yep that's what I'm referring to :)

jhs commented 10 years ago

@jcrugzz What 1.0.0 refactor are you talking about? Thanks.

jcrugzz commented 10 years ago

@jhs I have a refactor branch I was working on based on the changes-stream module I made. I would love your input and suggestions! I was considering using seq-file in the follow layer as the abstraction layer has shifted slightly as I have found it useful in application.