pachterlab / kb_python

A wrapper for the kallisto | bustools workflow for single-cell RNA-seq pre-processing
https://www.kallistobus.tools/
BSD 2-Clause "Simplified" License
141 stars 23 forks source link

If it doesn't exist yet, is it possible to add an option that filters the genes by abundance #220

Closed NikTuzov closed 8 months ago

NikTuzov commented 8 months ago

Hello:

Presently, we can limit the barcodes to the most frequent ones by using the following arguments:

--filter bustools --filter-threshold threshold_value

Is there a similar option to restrict the list of genes/features to the most expressed ones, e.g. by using the total count of barcodes per gene, or by keeping the k% of genes with the highest total count. If not, could it be added to kb count?

Regards, Nik Tuzov

Yenaled commented 8 months ago

The reason we're not really adding such options to kb count is that we believe that all filtering operations should be done downstream. kb-python is designed to generate raw count matrices that can be read into downstream tools and/or to form an anndata object. I personally always do my work on unfiltered count matrices and then filter downstream.

NikTuzov commented 8 months ago

Thank you very much for a prompt reply.

NikTuzov commented 8 months ago

Closed.