jprante / elasticsearch-jdbc

JDBC importer for Elasticsearch
Apache License 2.0
2.84k stars 710 forks source link

Not Checking Statefile during schedule #617

Open altoic opened 9 years ago

altoic commented 9 years ago

I am not sure if this is the root of the issue but I have a schedule every minute 10 seconds the first run it tells me there is a state file. Afterwards the subsequent runs do not therefore its not checking the date and the incremental job is not performing correctly. Basically its using the same date every minute 10 seconds and its pulling the same data all the time

[08:34:43,915][INFO ][importer ][main] loaded state from /root/ELK_MZ/jdbces/bin/statefile.json [08:34:43,952][INFO ][importer.jdbc ][main] index name = 'logstash-'YYYY.MM.dd-'log', concrete index name = logstash-2015.07.31-log [08:34:43,963][INFO ][importer ][main] schedule with cron expressions [10 * * ? * ] [08:35:10,005][INFO ][importer.jdbc ][pool-2-thread-2] index name = 'logstash-'YYYY.MM.dd-'log', concrete index name = logstash-2015.07.31-log [08:35:10,031][INFO ][importer.jdbc ][pool-3-thread-1] strategy standard: settings = {index_settings.index.number_of_replica=0, connection_properties.oracle.jdbc.ReadTimeout=50000, metrics.interval=1m, elasticsearch.cluster=NJ_datacenter, password=, url=jdbc:oracle:thin:@/, index_timewindow=true, metrics.logger.json=true, index='logstash-'YYYY.MM.dd-'log', max_concurrent_bulk_requests=10, elasticsearch.port=9308, index_settings.index.number_of_shards=1, user=cw14, metrics.enabled=true, connection_properties.oracle.jdbc.TcpNoDelay=false, connection_properties.oracle.net.CONNECT_TIMEOUT=10000, type=log, statefile=/bin/statefile.json, elasticsearch.host=localhost, connection_properties.useFetchSizeWithLongColumn=false, max_bulk_actions=20000, schedule=10 * * ? * , metrics.logger.plain=false, sql.0.statement=SELECT \ FROM log where CREATION_TIME >= ?, metrics.lastexecutionend=2015-05-31T12:33:10.277Z, sql.0.parameter.0=$metrics.lastexecutionstart, metrics.lastexecutionstart=2015-05-31T12:33:10.228Z, metrics.counter=2}, context = org.xbib.elasticsearch.jdbc.strategy.standard.StandardContext@43f52b8 [08:35:10,141][INFO ][importer.jdbc.context.standard][pool-3-thread-1] metrics thread started [08:35:10,144][INFO ][importer.jdbc.context.standard][pool-3-thread-1] found sink class org.xbib.elasticsearch.jdbc.strategy.standard.StandardSink@20d89c2 [08:35:10,155][INFO ][importer.jdbc.context.standard][pool-3-thread-1] found source class org.xbib.elasticsearch.jdbc.strategy.standard.StandardSource@2a1f2b5c [08:35:10,167][INFO ][BaseTransportClient ][pool-3-thread-1] creating transport client, java version 1.7.0_75, effective settings {cluster.name=NJ_datacenter, host.0=localhost, port=9308, sniff=false, autodiscover=false, name=importer, client.transport.ignore_cluster_name=false, client.transport.ping_timeout=5s, client.transport.nodes_sampler_interval=5s} [08:35:10,233][INFO ][org.elasticsearch.plugins][pool-3-thread-1] [importer] loaded [support-1.6.0.0-d7bb0e9], sites [] [08:35:11,141][INFO ][BaseTransportClient ][pool-3-thread-1] trying to connect to [inet[localhost/127.0.0.1:9308]] [08:35:11,255][INFO ][BaseTransportClient ][pool-3-thread-1] connected to [[MASTER][imbeHOcvSJC8VbnnSH10bQ][inet[localhost/127.0.0.1:9308]]{master=true}] [08:35:18,950][INFO ][importer.jdbc.context.standard][pool-3-thread-1] state persisted to /root/ELK_MZ/jdbc_es/bin/statefile.json [08:36:10,002][INFO ][importer.jdbc ][pool-2-thread-2] index name = 'logstash-'YYYY.MM.dd-'log', concrete index name = logstash-2015.07.31-log [08:36:10,005][INFO ][importer.jdbc.context.standard][pool-6-thread-1] found sink class org.xbib.elasticsearch.jdbc.strategy.standard.StandardSink@7944e589 [08:36:10,006][INFO ][importer.jdbc.context.standard][pool-6-thread-1] found source class org.xbib.elasticsearch.jdbc.strategy.standard.StandardSource@662f4d8a [08:36:10,007][INFO ][BaseTransportClient ][pool-6-thread-1] creating transport client, java version 1.7.0_75, effective settings {cluster.name=NJ_datacenter, host.0=localhost, port=9308, sniff=false, autodiscover=false, name=importer, client.transport.ignore_cluster_name=false, client.transport.ping_timeout=5s, client.transport.nodes_samplerinterval=5s} [08:36:10,011][INFO ][org.elasticsearch.plugins][pool-6-thread-1] [importer] loaded [support-1.6.0.0-d7bb0e9], sites [] [08:36:10,118][INFO ][BaseTransportClient ][pool-6-thread-1] trying to connect to [inet[localhost/127.0.0.1:9308]] [08:36:10,129][INFO ][BaseTransportClient ][pool-6-thread-1] connected to [[MASTER][imbeHOcvSJC8VbnnSH10bQ][][inet[localhost/127.0.0.1:9308]]{master=true}] [08:36:10,145][INFO ][metrics.source.json ][pool-5-thread-1] {"totalrows":14219,"elapsed":60117,"bytes":2591850,"avg":182.0,"dps":236.5221152086764,"mbps":0.04210295782598932} [08:36:10,146][INFO ][metrics.sink.json ][pool-5-thread-1] {"elapsed":60117,"submitted":14219,"succeeded":14219,"failed":0,"bytes":10344600,"avg":727.0,"dps":236.5221152086764,"mbps":0.16804145977843205} [08:36:14,000][INFO ][importer.jdbc.context.standard][pool-6-thread-1] state persisted to /***/bin/statefile.json

minotaursu commented 8 years ago

i have this same problem.it seems elasticsearch-jdbc loaded state at the fiirst run, load state file every time during schedule instead. the version of elasticsearch-jdbc is 2.1.0.0