velocyto-team / velocyto.py

RNA velocity estimation in Python
http://velocyto.org/velocyto.py/
BSD 2-Clause "Simplified" License
159 stars 82 forks source link

Is there filter cell in the count process? #266

Open hahia opened 4 years ago

hahia commented 4 years ago

Hello I use velocyto to count spliced / unspliced counts on dropseq data

I use the command velocyto run -o ./loom/ -m /cluster/huanglab/hhuang/Database/RNA_velocity/rmsk/hg19/hg19_rmsk.gtf /cluster/huanglab/hhuang/project/jing/Work/2020.7.22/data_ziwei_new_alignment/CH4-LN_S1_L001_b37/star_gene_exon_tagged.bam /cluster/huanglab/hhuang/project/jing/Final_ziwei_data/20200701/Homo_sapiens.GRCh37.75.gtf

without -b parameter

But when I get the loom file and extrac cells we interested, I found that 8 cells are missing.

I feel little confuse, Is there filter cell in the count process?

Rohit-Satyam commented 1 year ago

Hi were you able to figure this out? I am also getting a lot of cells filtered when using run10x ranther than just run !! run is actually removing more cells than run10x.

023-08-20 10:28:59,038 - WARNING - 100 of the barcodes where without cell
2023-08-20 10:28:59,531 - DEBUG - Counting for batch 973, containing 100 cells and 39469 reads
2023-08-20 10:28:59,734 - DEBUG - 0 reads not considered because fully enclosed in repeat masked regions
2023-08-20 10:28:59,748 - WARNING - The barcode selection mode is off, no cell events will be identified by <80 counts
2023-08-20 10:28:59,749 - WARNING - 98 of the barcodes where without cell
2023-08-20 10:28:59,982 - DEBUG - Counting for batch 974, containing 100 cells and 22925 reads
2023-08-20 10:29:00,170 - DEBUG - 0 reads not considered because fully enclosed in repeat masked regions
2023-08-20 10:29:00,179 - WARNING - The barcode selection mode is off, no cell events will be identified by <80 counts
2023-08-20 10:29:00,180 - WARNING - 99 of the barcodes where without cell
2023-08-20 10:29:00,278 - DEBUG - Counting for batch 975, containing 100 cells and 7347 reads
2023-08-20 10:29:00,319 - DEBUG - 0 reads not considered because fully enclosed in repeat masked regions
2023-08-20 10:29:00,322 - WARNING - The barcode selection mode is off, no cell events will be identified by <80 counts
2023-08-20 10:29:00,324 - WARNING - 100 of the barcodes where without cell
2023-08-20 10:29:01,406 - DEBUG - Counting for batch 976, containing 59 cells and 14253 reads
2023-08-20 10:29:01,481 - DEBUG - 0 reads not considered because fully enclosed in repeat masked regions
2023-08-20 10:29:01,488 - WARNING - The barcode selection mode is off, no cell events will be identified by <80 counts
2023-08-20 10:29:01,488 - WARNING - 58 of the barcodes where without cell
2023-08-20 10:29:01,490 - DEBUG - 651160 reads were skipped because no apropiate cell or umi barcode was found
2023-08-20 10:29:01,490 - DEBUG - Counting done!
2023-08-20 10:29:01,519 - DEBUG - Example of barcode: ACACGCGCACAAATCC and cell_id: possorted_genome_bam_XT98I:ACACGCGCACAAATCC

I checked the cellranger HTML report and I can see that the number of cells in .loom files produced using run10x are exactly same as the number of cells detected by cell Ranger and background cells are absent.