mountetna / magma

Data server with friendly data loaders
GNU General Public License v2.0
5 stars 2 forks source link

Queries that join to a large table are fatally slow. #69

Closed graft closed 3 years ago

graft commented 6 years ago

If a table is too large, joins to that table take forever to compute because they end up using sequential scan rather than an index scan. Apparently this is the proper behavior if the query must return > about 10% of the table - with this many rows, an index search is no longer efficient.

This can be easily simulated by trying to do a search in Timur on the ipi gene_exp table (which has 95M rows currently) for "rna_seq~CRC" - this joins gene_exp and rna_seq; subsetting the latter is efficient, but pulling the corresponding rows from gene_exp is simply not. This may be because the table is just too damn big, period.

Possibly this can be fixed with partitions...?

graft commented 3 years ago

Perhaps closed by the destruction of the enormous gene_exp table and its replacement with the more compact matrix type.