single-cell-genetics / vireo

Demultiplexing pooled scRNA-seq data with or without genotype reference
https://vireoSNP.readthedocs.io
Apache License 2.0
73 stars 27 forks source link

all unassigned for 10x single cell RNAseq dataset using 1000genome vcf #37

Open songeric1107 opened 2 years ago

songeric1107 commented 2 years ago

Hi Huang, Thanks for reading my previous issue. I had another problem. I am trying to apply vireo with scRNAseq datasets based on 1000 genome vcf file. However, all the cells are classified as unassigned . is that due to resolution of vcf?

cellsnp-lite -s unassigned_alignments.bam -b barcodes.tsv.gz -O sc1_all_small_min5 -R ../../ref/genome1K.phase3.SNP_AF5e2.chr1toX.hg38.yang.vcf -p 20 --minMAF 0.1 --minCOUNT 5 --UMItag Auto --gzip

CELL_DIR=cellsnp-lit_analysis/new_analysis/sc1_all_small_min5/

OUT_DIR=vireo_analysis/sc1_output_min5/ vireo -c $CELL_DIR -N 4 -o $OUT_DIR

the log file:

Welcome to vireoSNP v0.5.6!

use -h or --help for help on argument. [vireo] Loading cell folder ... [vireo] Demultiplex 676216 cells to 4 donors with 37 variants. [vireo] lower bound ranges [-907.9, -897.7, -892.9] [vireo] allelic rate mean and concentrations: [[0.138 0.529 0.98 ]] [[ 264.8 1050.6 224.6]] [vireo] donor size before removing doublets: donor0 donor1 donor2 donor3 169054 169056 169050 169055 [vireo] final donor size: unassigned 676216 [vireo] All done: 5 min 59.7 sec

huangyh09 commented 2 years ago

Hi, from the log, it seems there are only 37 variants used to demultiplex 67K cells. You may check if the chromosome ids have different patterns, e.g, w/ or w/o "chr" between your bam file and your *.yang.vcf?

Yuanhua

songeric1107 commented 2 years ago

no, the format is the same.

I change to the larger database, I am able to get the donor list. Var1 Freq donor0 4238 donor1 4148 donor2 3904 donor3 4216 doublet 234 unassigned 659476

however, the barcodes for each predicted donor could not be validated. I have the barcode from 3 donors, no overlap to any predicted donor barcodes

huangyh09 commented 2 years ago

Have you resolved the issue with the low number of variants? I don't think 37 variants are likely informative enough to demultiplex a large number of cells well.

Yuanhua