logstash-plugins / logstash-input-jdbc

Logstash Plugin for JDBC Inputs
Apache License 2.0
449 stars 187 forks source link

Don't execute select count(*) in the beginning #305

Open iirekm opened 6 years ago

iirekm commented 6 years ago

For complex SQL queries as in our case it's performance killer, everything takes twice as long.

Johnb21 commented 6 years ago

This functionality is making the jdbc input unusable for my data retrieval process because I continue to get back I/O errors due to how long this takes to run and my connections appear to be timing out during that time.

EDIT: I took the query for this count and tried to manually run it - i stopped it after nonstop running for 30 minutes. It never finished. Is there a reason this count step is needed? Is there no config / setting that we can set to disable it?

jintianzhouyine commented 5 years ago

Even if i don't configure the jdbc_paging_enabled parameter,it still worked.

danhermann commented 5 years ago

This also prevents the JDBC input from working with a number of datastores that provide JDBC drivers but are not fully SQL-compliant such as Apache Jena, etc.

sgtmacdiesel commented 5 years ago

Is there any update on this? I am having this same issue. This count(*) is causing a heavy query to run twice on our database. Is there any way to disable it?

stayman commented 5 years ago

I'd like to ask the same question. 4.3.11 actually made the problem worse, since the count is now executing in all contexts, not just debug mode.

We're seeing consistent failures because of the runtime of the count query.

wouterbac1 commented 5 years ago

Hi I am having a related (or perhaps the same) issue. I am trying to read from a Firebird database with Logstash using the following input { jdbc { jdbc_driver_library => "c:/bug/jaybird-full-3.0.6.jar"
jdbc_driver_class => "org.firebirdsql.jdbc.FBDriver" jdbc_connection_string => "jdbc:firebirdsql://127.0.0.1/D:/DBASE.FDB" jdbc_user => SYSDBA jdbc_password => pwd12442 jdbc_paging_enabled => false tracking_column => "unix_ts_in_secs" use_column_value => true tracking_column_type => "numeric" schedule => "/5 " statement => "SELECT * FROM TABLE1" } }

When I analyze the debug information in my Windows terminal I note that somehow 'SELECT count(*) AS "COUNT" FROM is added in front of my statement and AS "T1" LIMIT 1 is added behind my statement. For example:

SELECT count() AS "COUNT" FROM (select from TABLE1) AS "T1" LIMIT 1

Obviously the SQL error I get is that the Token is unknown. Does anyone have a solution for it?

jaredwa commented 5 years ago

Not sure if anyone else had this issue - I did and it is frustrating that Logstash can't put a kill on this count - it would be easy for them to let the user just insert their own count - preempting the query and causing them NO HARM WHATSOEVER. But, the workaround I found was to limit the query in the statement by the ID, so that the count was inherently limited. So basically

statement => "SELECT *t FROM table WHERE id > :sql_last_value and id < 10000000 ORDER BY id ASC"

Of course, this means you have to babysit and up the numbers as the search passes it, but it's better than having it all just fail.

filip-java commented 4 years ago

I worked around it by using a prepared statement with no params. Since prepared statement does not support paging, the plugin does not issue the extra "count" query.

sanosay commented 4 years ago

Any update in respect to this? It seems like it's been abandoned / ignored, while so many complain about it. In large tables, this is indeed a show stopper.

Johnb21 commented 4 years ago

I worked around it by using a prepared statement with no params. Since prepared statement does not support paging, the plugin does not issue the extra "count" query.

This only works when you have a query that is not expecting data to be used from another input.

My work around was being forced to write everything in Python. Not exactly a logstash solution

webmasterlv commented 4 years ago

I have the same issue right now. I have a table with 100M rows. The same table was indexed with Sphinx and it was a breeze. Adding a separate statement_count where you can write your own SQL query that produces count for the table indeed would be the best solution.

Noah1989 commented 8 months ago

This has been fixed here: https://github.com/logstash-plugins/logstash-integration-jdbc/pull/95