scVENUS / PeekabooAV

Peekaboo Extended Email Attachment Behavior Observation Owl
https://peekabooav.de
GNU General Public License v3.0
66 stars 20 forks source link

db: Tune DB indices once more #216

Closed michaelweiser closed 2 years ago

michaelweiser commented 2 years ago

We used to have an index on column analysis_time of the sample info table ever since its addition to the schema. The intent likely was to improve order_by performance. This was likely never the case since we use analysis_time as part of a much larger compound search operation.

It now turns out beyond that that MariaDB/MySQL indices have a peculiar property: Only the first occurence of a column in an index can be used to speed up inequality comparisons and no other column beyond that can be used at all.[1, 2]

So first of all we add analysis_time from above to our compound index to speed up our large analysis journal fetch compound search operation. Then we reorder the index columns so those matched for inequality come last. Those we order by selectiveness, assuming that a result != failed clause will discard many more candidates than an id != current_sample.id clause.

Note how we do not need to restructure the actual query since the RDBMS query planner/optimizer will do that for us.

postgres is not impacted by this change, neither positively nor negatively. It is also the reason why we keep id in the index even though it is unusable for MySQL/MariaDB.

[1] https://www.percona.com/blog/2015/04/27/indexing-101-optimizing-mysql-queries-on-a-single-table/#crayon-628ca3aa51208792701488 [2] https://dba.stackexchange.com/questions/249848/why-we-cant-have-more-than-one-inequality-condition-in-mysql-indexing