rki-mf1 / covsonar

A database-driven system for handling genomic sequences of SARS-CoV-2 and screening genomic profiles.
GNU General Public License v3.0
6 stars 0 forks source link

Rewrite property handling to use a simple table rather than an entity-attribute-value table #110

Closed matthuska closed 1 year ago

matthuska commented 1 year ago

Some lineage querying is broken, and tests need to be updated.

Not ready yet, but I think it's enough that I can do some basic performance tests to make sure this is worth the effort.

matthuska commented 1 year ago

The basic implementation is finished. I've changed the database schema and adapted the code in a minimal way to get everything working, including tests.

There is still more work that should be done at some point: optimize the queries to take advantage of the new structure, and remove/simplify some of the existing property handling code.

matthuska commented 1 year ago

TODOs:

matthuska commented 1 year ago

I already sent an email around about this but it'd good to keep this information in one place.

I rewrote the match SQL query by hand to avoid the large inner query that is causing a huge amount of disk usage for covsonar 2 and the results are striking:

Empty match (returns the full set of sequences in the database): Original query: Run Time: real 355.408 user 210.462462 sys 67.405746 My new query: Run Time: real 57.617 user 25.457726 sys 1.902713

Mutation and lineage match: Original query: Run Time: real 388.524 user 208.856196 sys 132.349775 My new query: Run Time: real 36.595 user 17.045304 sys 1.255953

So rewriting the queries resulted in 6.2x speedup for the first query and a 10.7x speedup for the second query.

The queries themselves are attached.

query-empty-match.sql.txt query-match-lineage-mut.sql.txt

matthuska commented 1 year ago

Closed because we do not plan to continue covsonar 2 development.