rethinkdb / logstash-input-rethinkdb

Other
55 stars 13 forks source link

Set backlog date when backfill is true #9

Closed fcruz closed 8 years ago

fcruz commented 8 years ago

There is no property to set how far back you want Logstash get data from RethinkDB when backfill is set to true. Let' s say Logstash script is down for one day and you are going to restart it but just want to go back 24hrs and not the previous ones. It probably might be possible using filters but that would be great if we could have something like this:

$ bin/logstash -e '
input {rethinkdb
   {host => "localhost"
    port => 28015
    auth_key => ""
    watch_dbs => ["db1", "db2"]
    watch_tables => ["test.foo", "db2.baz"]
    backfill => true
    backlog => '2016-02-03 00:00:00.000'
    }}
output {stdout {codec => json_lines}}'
danielmewes commented 8 years ago

There's currently no way from the RethinkDB side to filter this, unless you explicitly add a lastChangedAt field to your documents in the application code and update them on every write.

Unfortunately I don't think we can solve this right now in a general way. Eventually we hope to implement fully resumable changefeeds in RethinkDB, which will make a full backfill unnecessary. Instead it would use something similar to an internal timestamp to find out where to pick up from (but it wouldn't be a real world clock time). There was a bunch of discussion on this in https://github.com/rethinkdb/rethinkdb/issues/3471 , though the proposal we eventually settled on as a first step is simpler and not sufficient for this.