jprante / elasticsearch-jdbc

JDBC importer for Elasticsearch
Apache License 2.0
2.84k stars 710 forks source link

Not update/add rows #727

Open IgorN opened 8 years ago

IgorN commented 8 years ago

Hi! I have next config for my test db table elasticsearch and river have v. 2.1

echo '
{
    "type" : "jdbc",
    "jdbc" : {
        "schedule" : "0 0-59 0-23 ? * *",
        "url" : "jdbc:mysql://localhost:33060/discover",
        "user" : "homestead",
        "password" : "secret",
        "sql" : [
        {
            "statement" : "select id as _id, first_name as firstName, last_name as lastName, job_title as jobTitle from testElastic where datetime > (DATE_SUB(?, INTERVAL 5 MINUTE));",
            "parameter" : [ "$metrics.lastexecutionstart" ]
            }
        ],
        "autocommit" : true,
        "statefile" : "statefile.json",
        "elasticsearch" : {
            "cluster" : "elasticsearch_igne",
            "host" : "localhost",
            "port" : "9300"
        },
        "index" : "discover",
        "type" : "contacts",
        "fetchsize" : "min",
        "max_bulk_actions" : 20000,
        "max_concurrent_bulk_requests" : 10,
        "metrics" : {
            "enabled" : true,
            "logger" : {
                "json" : true
            },
            "interval" : "1m"
        }
    }
}

I expect that when I update the some rows or add the new rows the river will update index but it doesn't do it. What is wrong? But when I run at first (without statefile.json) the index has all rows and everything working correctly but the river updates the all rows again and again (_version filed increase as for me it's wrong behavior ) What are I doing incorrect?

Thanks!

IgorN commented 8 years ago

I fixed the trouble was described above. But why if some rows have been updated or added they updated again and again and again (_version field increase as for me it's wrong behavior ) maybe this script will be better run via supervisord? I think this java script uses cache data or something else because if i run query via some sql manager i don't get the rows but the script fetch old rows again and again.

IgorN commented 8 years ago

I think the trouble with timezone If i updated some rows end sed datetime minus 2h everything working correctly... I tried to use "timezone" : "TimeZone.getDefault()", but it didn't help me (((

Who has any ideas?

jprante commented 8 years ago

"TimeZone.getDefault()" does not work.

Use a Java timezone ID https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html like Europe/Berlin

IgorN commented 8 years ago

@jprante Thanks! If I understood right, I should use "UTC"