jprante / elasticsearch-jdbc

JDBC importer for Elasticsearch
Apache License 2.0
2.84k stars 710 forks source link

What are the exact parameter that needs to set for bulk indexing #600

Open jsbonline2006 opened 9 years ago

jsbonline2006 commented 9 years ago

Hi, This question may be simple but it is not working for me so I am asking it. I wanted to configure: 1) Max Records to be fetch from DB 2) Max records to be push to Bulk Indexing at one time. 3) Concurrent request. I followed link, but some how it is not working. https://github.com/jprante/elasticsearch-jdbc/wiki/How-bulk-indexing-is-used-by-the-JDBC-river

Could you please guide me with exact setting for this. Regards, Jayesh Bhoyar

jsbonline2006 commented 9 years ago

If you check in below logs only 10 records are getting pushed to ES.

{ "type" : "jdbc", "bulk_size" : 1000, "max_bulk_requests" : 50, "bulk_flush_interval" : "10s", "jdbc" : { "url" : "jdbc:oracle:thin:@host:port:SID", "driver": "oracle.jdbc.OracleDriver", "user" : "user", "fetchsize" : 1000, "password" : "pass", "sql" : "select * from xyz", "index" : "test", "type" : "article", "max_bulk_actions" : 1000, "metrics":true, "max_concurrent_bulk_requests" : 16, "connection_properties" : { "oracle.jdbc.TcpNoDelay" : false, "useFetchSizeWithLongColumn" : false, "oracle.net.CONNECT_TIMEOUT" : 10000, "oracle.jdbc.ReadTimeout" : 50000 }, "elasticsearch" : { "cluster" : "elasticsearch", "host" : "localhost", "port" : 9300 } } }

[22:26:47,519][DEBUG][BulkTransportClient ][elasticsearch[importer][bulk_processor][T#1]] before bulk [579] [actions=7] [bytes=27814] [concurrent requests=1] [22:26:47,521][DEBUG][BulkTransportClient ][elasticsearch[importer][transport_client_worker][T#4]{New I/O worker #4}] after bulk [579] [succeeded=3744] [failed=0] [2ms] [concurrent requests=0] [22:26:52,522][DEBUG][BulkTransportClient ][elasticsearch[importer][bulk_processor][T#1]] before bulk [580] [actions=8] [bytes=28898] [concurrent requests=1] [22:26:52,526][DEBUG][BulkTransportClient ][elasticsearch[importer][transport_client_worker][T#5]{New I/O worker #5}] after bulk [580] [succeeded=3752] [failed=0] [2ms] [concurrent requests=0] [22:26:57,526][DEBUG][BulkTransportClient ][elasticsearch[importer][bulk_processor][T#1]] before bulk [581] [actions=9] [bytes=23479] [concurrent requests=1] [22:26:57,529][DEBUG][BulkTransportClient ][elasticsearch[importer][transport_client_worker][T#6]{New I/O worker #6}] after bulk [581] [succeeded=3761] [failed=0] [2ms] [concurrent requests=0] [22:27:02,530][DEBUG][BulkTransportClient ][elasticsearch[importer][bulk_processor][T#1]] before bulk [582] [actions=4] [bytes=8435] [concurrent requests=1] [22:27:02,532][DEBUG][BulkTransportClient ][elasticsearch[importer][transport_client_worker][T#4]{New I/O worker #4}] after bulk [582] [succeeded=3765] [failed=0] [1ms] [concurrent requests=0] [22:27:07,534][DEBUG][BulkTransportClient ][elasticsearch[importer][bulk_processor][T#1]] before bulk [583] [actions=8] [bytes=55320] [concurrent requests=1] [22:27:07,540][DEBUG][BulkTransportClient ][elasticsearch[importer][transport_client_worker][T#5]{New I/O worker #5}] after bulk [583] [succeeded=3773] [failed=0] [5ms] [concurrent requests=0]

jprante commented 9 years ago

Note, parameters outside the jdbc block are ignored.

Your query is slow. What you see is the auto bulk indexing at each 5 seconds. There is not much effect in concurrent bulk request or bulk size under these circumstances.

You must change SQL query to limit maximum records form database, e.g. with a where clause.