Closed dariomel closed 4 years ago
Thanks for pointing that out. Fixed in a1f6e15
Note that this fix only skips the 'Calculating cell attributes' block if all latent_var
are present as column in cell_attr
. If that's not the case, there will still be time and RAM used for adding a bunch of columns, namely umi
, gene
, log_umi
, log_gene
, umi_per_gene
, log_umi_per_gene
. These might not always be needed, but were helpful during method development. I'll leave that optimization for some other time.
As of 812813e fewer unneeded columns in cell_attr
should be created - none if all latent_var
are given as part of the input cell_attr
.
I am guessing that the following if statement should use
any(...)
, instead ofall(...)
, because the current behavior does not make a lot of sense:https://github.com/ChristophH/sctransform/blob/81614def2851d690875089528fc17bd6be19d030/R/vst.R#L104
Specifically, if the argument
latent_var
contains'log_umi'
and the data.framecell_attr
does contain a column named'log_umi'
, thensetdiff(latent_var, colnames(cell_attr))
yieldscharacter(0)
, which causessetdiff(latent_var, colnames(cell_attr)) %in% known_attr
to yieldlogical(0)
, which causesall(setdiff(latent_var, colnames(cell_attr)) %in% known_attr)
to yieldTRUE
, which causes the if statement to be executed, which wastes time and RAM by recalculating column'log_umi'
and adding a bunch of unneeded columns to data.framecell_attr
.