influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.59k stars 3.54k forks source link

Each extra field added to SELECT ... WHERE field1=value1, incurs a performance penalty #9310

Closed Bomper closed 6 years ago

Bomper commented 6 years ago

Bug report

In other words, SELECTing more fields WHERE field=value is way slower than SELECT field WHERE field=value. The more fields in the SELECT clause, the slower the query is. I find that surprising because once a point was found, all fields in it should have been found.

System info: Influx 1.4.2, Ubuntu 16.04 VPS with 1GB RAM.

Steps to reproduce:

  1. Series with 8 fields, 4 tags and ~20M points, out of which 9 have f1 = -1.
  2. SELECT f1 FROM m WHERE f1=-1 // returns in ~15 seconds
  3. SELECT f1, f2 FROM m WHERE f1=-1 // returns in ~20 seconds
  4. SELECT f1, f2, t1, t2, t3, t4 FROM m WHERE f1=-1 // returns in ~20 seconds. No penalty when adding tags
  5. SELECT * FROM m WHERE f1=-1 // times out after ~60 seconds, even though coordinator.query-timeout = "30s", not 60.
  6. SELECT f1, f2, f3, f4, f5, f6, t1, t2, t3, t4 FROM m WHERE f1=-1 // times out after ~60 seconds
  7. select * from ticks where bid=0 // never times out (waited ~2 minutes before ^C)

Expected behavior: Intuitively, I expect SELECT [whatever] WHERE field=value to be slow because fields aren't indexed, but once the point with that field=value was found, the whole point was found. Therefore, SELECT'ing just the field, or the field and tags, should be as fast.

Actual behavior: The more fields are in the SELECT clause, the slower the query is.

Additional info: profiles.tar.gz vars.txt iostat.txt

jsternberg commented 6 years ago

InfluxDB is a columnar database so additional fields result in a penalty. This may seem unintuitive coming from a typical SQL database that are stored as rows. In those databases, once you find an entry, the rest of the entries are with it.

I only skimmed this, but it seems to go over well why Postgres is row-oriented and you could see from some of the reasons why they chose to be row-oriented why we chose to be column-oriented: https://wiki.postgresql.org/wiki/ColumnOrientedSTorage

Either way, since the database is column-oriented, that means that comparisons between columns requires us to do a simple join operation based on their time and series key to compare the values.

This is unexpected to change so I am going to close this since there is no action we can take on this. It's just part of the database design.

Bomper commented 6 years ago

Thanks @jsternberg for the explanation.

Any ideas as to why the query doesn't time out after 30s, as specified by coordinator.query-timeout = "30s"?

jsternberg commented 6 years ago

It's a bit complicated and depends on the version of InfluxDB you are using. Until 1.3, it wasn't possible to kill a query during planning. If you are using 1.3 or later, then it should manage to kill it while planning, but it has to hit an interrupt point in the query before it can be interrupted.

Are you using the open source or enterprise version of the software? There was a bug in the enterprise version that prevented query interrupts during planning from working that was fixed a few months ago. I'm not aware of that same bug existing in the open source version though.

Bomper commented 6 years ago

I'm using the open source version, 1.4.2.

jsternberg commented 6 years ago

It's likely hitting an uninterruptible section of the code. Do you know the cardinality of your database? I can try testing out that cardinality along with the number of fields to see if I can reproduce the issue.

The influx_inspect report command may help so I can reproduce your database. If possible, a database dump would also be easier since then I would be running on the exact same data. Are you able to provide a database dump?