phutchins / logstash-input-mongodb

MongoDB input plugin for Logstash
Other
187 stars 104 forks source link

Lost only one data when to elasticsearch #39

Open shi-yuan opened 8 years ago

shi-yuan commented 8 years ago

my logstash config:

input {
    mongodb {
        uri => 'mongodb://127.0.0.1:27017/wiki'
        collection => 'wiki'
        placeholder_db_dir => "E:/mongo2es/data"
        placeholder_db_name => "datalogstash_sqlite_wiki.db"
        batch_size => 1000
    }
}

filter {
    mutate {
        remove_field => [ "host", "@version", "@timestamp", "logdate", "log_entry" ]
    }
}

output {
    stdout { codec => rubydebug }

    file {
        path => "E:/mongo2es/logs/mongo2es-wiki.log"
    }

    elasticsearch {
        index => "wiki"
        document_type => "wiki"
        document_id => "%{mongo_id}"
        hosts => ["127.0.0.1:9200"]
    }
}

2767278 in mongodb, but 2767277 in elasticsearch.

Any thought? Thanks in Advance

bogdangi commented 8 years ago

It can be that elasticsearch cannot write one document because of different payload. for example if first document was stored with this payload: {"body": null} Then when logstash tries put document {"body":{"text":"some text"}} It will fail.

Elasticsearch usually puts errors like this into logs.

shi-yuan commented 8 years ago

I export mongodb data to json file and read it line by line, then store json object to elasticsearch. In this way, no problem.

bogdangi commented 8 years ago

So I tried to import 190968 documents and one lost on elasticsearch (190967)

bogdangi commented 8 years ago

Hi I figured out why it losing one document you can have a look in pull request #41 https://github.com/phutchins/logstash-input-mongodb/pull/41/commits/6998388caf20c53748dcdfc55e5798b8d90bc56e#diff-b50cbd06ed9aac325fc5552aa327afbbR138

searchandanalytics commented 8 years ago

Hi, @bogdangi @shi-yuan

This is my configuration and have only one document in collection but logstash not exit the loop once documents are read. Any Suggestion. input { mongodb { uri => "mongodb://localhost:27017/logtry?ssl=false" placeholder_db_dir => "d:/elk" placeholder_db_name => "logstash_mo.db" collection => "sample" batch_size => 0 } } output {

    stdout { codec => json }

}

D, [2016-07-03T17:50:46.408000 #8236] DEBUG -- : MONGODB | localhost:27017 | log try.find | SUCCEEDED | 0.034s D, [2016-07-03T17:50:46.487000 #8236] DEBUG -- : MONGODB | localhost:27017 | log try.listCollections | STARTED | {"listCollections"=>1, "cursor"=>{}, "filter"=>{ :name=>{"$not"=>/system.|\$/}}} D, [2016-07-03T17:50:46.500000 #8236] DEBUG -- : MONGODB | localhost:27017 | log try.listCollections | SUCCEEDED | 0.008s D, [2016-07-03T17:50:47.810000 #8236] DEBUG -- : MONGODB | localhost:27017 | log try.find | STARTED | {"find"=>"sample", "filter"=>{"_id"=>{"$gt"=>BSON::ObjectId ('577902d7beb1f37c22e1f458')}}, "limit"=>0} D, [2016-07-03T17:50:47.821000 #8236] DEBUG -- : MONGODB | localhost:27017 | log try.find | SUCCEEDED | 0.004s D, [2016-07-03T17:50:47.968000 #8236] DEBUG -- : MONGODB | localhost:27017 | log try.listCollections | STARTED | {"listCollections"=>1, "cursor"=>{}, "filter"=>{ :name=>{"$not"=>/system.|\$/}}} D, [2016-07-03T17:50:47.983000 #8236] DEBUG -- : MONGODB | localhost:27017 | log try.listCollections | SUCCEEDED | 0.009s D, [2016-07-03T17:50:50.600000 #8236] DEBUG -- : MONGODB | localhost:27017 | log try.find | STARTED | {"find"=>"sample", "filter"=>{"_id"=>{"$gt"=>BSON::ObjectId ('577902d7beb1f37c22e1f458')}}, "limit"=>0} D, [2016-07-03T17:50:50.612000 #8236] DEBUG -- : MONGODB | localhost:27017 | log try.find | SUCCEEDED | 0.006s

bogdangi commented 8 years ago

I guess answer is batch_size => 0

findemor commented 8 years ago

I am having the same problem.

I have 4 documents inside "merge" collection from my mongodb. When I run logstash, elastic search is loaded with 3 documents. I did some tests and I noticed that the lost document is always the first document in my mongodb collection. If I try with a collection that only have a single document, then nothing is loaded into elasticsearch.

My elasticsearch index is new and empty and this is my configuration file:

input { mongodb { uri => 'mongodb://--------hidden------/ch-db' placeholder_db_dir => 'C:/temp/' placeholder_db_name => 'logstash_sqlite.db' collection => 'merge' } }

output { elasticsearch { index => "poll" hosts => "localhost:9200" } }