jprante / elasticsearch-jdbc

JDBC importer for Elasticsearch
Apache License 2.0
2.84k stars 710 forks source link

java.lang.OutOfMemoryError when using parameters in query #333

Closed weph closed 10 years ago

weph commented 10 years ago

I'm trying to import a table with about 50 million rows into elasticsearch with the jdbc river. At first I did a full import which worked like a charm:

{
    "sql": [{
        "statement": "select * from mytable"
    }]
}

As I need the ability to do incremental imports I started to experiment with the parameter feature and ran into an out of memory exception. The query below returns about 2.2 million rows and results in the exception. When I change the parameter to 2014-09-01 (360.000 rows) it works fine. I changed the fetchsize but this seemed to have no effect.

I tested on ES 1.3.2 with 1.3.0.4 and ES 1.2.4 with 1.2.2.0. Database is a MySQL 5.5.35, mysql-java-connector is 5.1.28.

{
    "sql": [{
        "statement": "select * from mytable where datefield > ?",
        "parameter": ["2014-08-01"]
    }]
}
java.lang.OutOfMemoryError: Java heap space
java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:188)
    at org.xbib.pipeline.simple.SimplePipelineExecutor.waitFor(SimplePipelineExecutor.java:118)
    at org.xbib.elasticsearch.plugin.feeder.AbstractFeeder.run(AbstractFeeder.java:241)
    at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.OutOfMemoryError: Java heap space
    at com.mysql.jdbc.Buffer.getBytes(Buffer.java:207)
    at com.mysql.jdbc.Buffer.readLenByteArray(Buffer.java:339)
    at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:2082)
    at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:3554)
    at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:491)
    at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:3245)
    at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:2413)
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2836)
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2825)
    at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2156)
    at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2313)
    at org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.executeQuery(SimpleRiverSource.java:572)
    at org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.executeWithParameter(SimpleRiverSource.java:355)
    at org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.fetch(SimpleRiverSource.java:284)
    at org.xbib.elasticsearch.plugin.feeder.jdbc.JDBCFeeder.fetch(JDBCFeeder.java:335)
    at org.xbib.elasticsearch.plugin.feeder.jdbc.JDBCFeeder.executeTask(JDBCFeeder.java:179)
    at org.xbib.elasticsearch.plugin.feeder.AbstractFeeder.newRequest(AbstractFeeder.java:362)
    at org.xbib.elasticsearch.plugin.feeder.AbstractFeeder.newRequest(AbstractFeeder.java:53)
    at org.xbib.pipeline.AbstractPipeline.call(AbstractPipeline.java:87)
    at org.xbib.pipeline.AbstractPipeline.call(AbstractPipeline.java:14)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    ... 1 more
sawickil commented 10 years ago

Not sure if my case is similar to the above one, but we also noticed 'out of memory' issue after 6 days since ES was started. For this period, there has only been 100 documents to be indexed. We use ES 1.2.3 with jdbc-river 1.2.3 and have two jdbc-river instances executed concurrently. Additionally, we use the Column strategy and Oracle 11. I don't remember how the Column strategy is implemented, but I guess it also uses some kind of parameters. My stacktrace: [2014-09-14 00:59:55,864][ERROR][Feeder ] java.lang.OutOfMemoryError: Java heap space java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252) at java.util.concurrent.FutureTask.get(FutureTask.java:111) at org.xbib.pipeline.simple.SimplePipelineExecutor.waitFor(SimplePipelineExecutor.java:118) at org.xbib.elasticsearch.plugin.feeder.AbstractFeeder.run(AbstractFeeder.java:241) at java.lang.Thread.run(Thread.java:722) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: java.lang.OutOfMemoryError: Java heap space at java.lang.reflect.Array.newArray(Native Method) at java.lang.reflect.Array.newInstance(Array.java:70) at oracle.jdbc.driver.BufferCache.get(BufferCache.java:226) at oracle.jdbc.driver.PhysicalConnection.getCharBuffer(PhysicalConnection.java:7600) at oracle.jdbc.driver.OracleStatement.prepareAccessors(OracleStatement.java:991) at oracle.jdbc.driver.T4CTTIdcb.receiveCommon(T4CTTIdcb.java:273) at oracle.jdbc.driver.T4CTTIdcb.receive(T4CTTIdcb.java:144) at oracle.jdbc.driver.T4C8Oall.readDCB(T4C8Oall.java:806) at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:355) at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:191) at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:523) at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:207) at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:863) at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1153) at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1275) at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3576) at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3620) at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:1491) at org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.executeQuery(SimpleRiverSource.java:572) at com.payu.elasticsearch.ColumnRiverSource.fetch(ColumnRiverSource.java:107) at com.payu.elasticsearch.ColumnRiverSource.fetch(ColumnRiverSource.java:56) at org.xbib.elasticsearch.river.jdbc.strategy.column.ColumnRiverFeeder.fetch(ColumnRiverFeeder.java:68) at org.xbib.elasticsearch.plugin.feeder.jdbc.JDBCFeeder.executeTask(JDBCFeeder.java:179) at org.xbib.elasticsearch.plugin.feeder.AbstractFeeder.newRequest(AbstractFeeder.java:362) at org.xbib.elasticsearch.plugin.feeder.AbstractFeeder.newRequest(AbstractFeeder.java:53) at org.xbib.pipeline.AbstractPipeline.call(AbstractPipeline.java:87) at org.xbib.pipeline.AbstractPipeline.call(AbstractPipeline.java:14) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166)

jprante commented 10 years ago

Is there a heap dump file available? I would like a look at a the statistics if the JDBC river (i.e. Oracle driver) ate all memory or not. I assume this is latest Oracle JDBC driver for 12c?

sawickil commented 10 years ago

Actually, there is no heap dump, however we will try first to upgrade the ojdbc driver version to the latest one as you suggested.

eliasah commented 10 years ago

Usually when Elasticsearch fails, it always leaves a java_pid[PID].hprof. Try to look for it in your elasticsearch home diretory or where you have lauched the river.

jprante commented 10 years ago

I have no doubt there was low memory detected by oracle driver, this does not mean that it's oracle driver's fault. Just want to make sure what version this is, to reproduce it.

sawickil commented 10 years ago

Jorg, it looks like a mess in our deps. The driver being used was ojdbc14 (crap:/), so don't investigate my case in this moment. I'm going to change the driver and we will see.

jprante commented 10 years ago

With MySQL, JDBC plugin suffered from insensitive default settings, also, the MySQL JDBC driver is touchy for setting up the "streaming mode", the only method to prevent OOM. I hope the latest release with "max_bulk_actions" : 10000 catches up the most annoying issues.

youchenlee commented 9 years ago

Sorry for reply a closed issue. Hope this help some people.

I ran into this issue. Finally found that the only solution to me is to optimize the sql statement, (which was too simple to be thought about.) Add some proper db index to make sure the sql statement can be finish as soon as possible.

select * from mytable was fast, the major overhead is data transferring, and since we are doing a streaming import, data transferring is not a big deal. Through select * from mytable where datefield > ? should take a long time & lots of memory when you got millions of data without proper indexes.