madsen-lab / valiDrops

16 stars 6 forks source link

Consider sanity check after rank_barcodes() #8

Closed rrydbirk closed 1 year ago

rrydbirk commented 1 year ago

At the moment, it's unrealistic to have samples with 10k++ cells/barcodes. Instead, having this many barcodes after rank_barcodes() could be a sign of wrong breakpoint estimation. The following calculations take forever to finish with this many barcodes.

Perhaps introducing a sanity check, e.g. stop if more than 10~20k barcodes, could be an idea? Also, this should be accompanied by a parameter force or similar to ignore the check and instead introduce a warning.

rrydbirk commented 1 year ago

Brief update on what worked for me in ranked order for best solution:

In the end, I went from 60k+ barcodes to ~10k barcodes in four problematic snRNA-seq samples with high ambient RNA load.