scVENUS / PeekabooAV

Peekaboo Extended Email Attachment Behavior Observation Owl
https://peekabooav.de
GNU General Public License v3.0
66 stars 20 forks source link

Knowntools efficiency improvements take two #219

Closed michaelweiser closed 1 year ago

michaelweiser commented 2 years ago

There is some room for improvement in the way Knowntools uses the database. In quick tests, analysis time of 1000 identical samples would go up from 14s with an empty database to 1m15s with 3000 analyses of this sample already in the database. This seemed to scale linearly, which makes some sense since the sample journal to be downloaded from the database grows linearly as well.

Also, we have different ways of referring to the current time between database, Knowntools and other analysers. I propose to haromise them to UTC and "aware" datetime objects at this opportunity.

Follow-up to #217 based on fully asyncio-converted Peekaboo.

michaelweiser commented 2 years ago

I'm kinda getting cold feet regarding the additional result_numeric column needed to make this work with sqlite. Do you see any way to solve this more efficiently/elegantly?

Jack28 commented 2 years ago

Without dropping support for sqlite?

michaelweiser commented 2 years ago

Without dropping support for sqlite?

As an immediate result yes, but more broadly not being able to support any RDBMS which does not support sorting of enums. This could hit us again in the future, e.g. if we were to support Oracle and discovered that it doesn't have that feature. I also ran into the exact same problem when playing with CouchDB (https://github.com/michaelweiser/PeekabooAV/blob/b612cdadf968e42d75130ce91f66ebc69a05a37c/peekaboo/db.py#L169).

Alternatively, we could drop the string-based result column and only save the numeric value. This would be perfectly fine for PeekabooAV itself but would make it less easy for the admin to work with the database directly.

Jack28 commented 2 years ago

So either we drop the old column and annoy admins, drop everything that doesn’t support enum, or somehow achieve a correct alphanumerical sort on the text representation ^^

I vote to add some documentation for keen admins who want to dig into the database

michaelweiser commented 2 years ago

So either we drop the old column and annoy admins, drop everything that doesn’t support enum, or somehow achieve a correct alphanumerical sort on the text representation ^^

Pretty much :/

I vote to add some documentation for keen admins who want to dig into the database

Okay, will dig into what that would look like and force push the result here. Bear with me.

Jack28 commented 1 year ago

I would like to test more on this code. Not because of the code but because a fallacy would likely go unnoticed for a long time with catastrophic effect. At the same time only thorough testing with the different databases reveals pitfalls

michaelweiser commented 1 year ago

Corrected the error with postgres by introducing an explict cast of the sorting column value instead of erronously converting the sorting column name to varchar.

Jack28 commented 1 year ago

LGTM 👍