Open johnkerl opened 11 months ago
Thanks @johnkerl
Just providing another concrete example, this one on the var axis:
(This class inherits ExperimentAxisQuery
.) It's interested in approximately 20K of the 60K genes, specifically protein-coding and miRNA genes, selected by soma_joinid
. We found the query is about ~25% faster with the var_query coords=
commented out, that is retrieving the full expression vectors for all genes and then selecting from that. (That is to say -- appreciably faster, but not by an order of magnitude or anything like that)
Re "adversarial stride" -- I'm not sure how we set up the var soma_joinid
s and whether there's any id locality in the set of genes selected. There's probably some locality but OTOH I wouldn't be surprised if there's effectively at least one hit in each tile -- so it might be adversarial with an innocent biological basis =) Anyway, we can easily understand why the query would be not faster with the coords=
, but it's curious that it's appreciably slower.
cc @pablo-gar @bkmartinjr
Use-case:
obs
is maybe 30M cells,var
maybe 60K genes;X
sparse 30M x 60Kobs
to get all 30M cell IDs but decimate by 1/100 -- _taking 1 out of every 100, skipping 99 each time -- to get 300K cell IDsX
SOMA SparseNDArrayX
(with no core-level decimation) and then decimating at the clientTracks [sc-34843]