wheaton5 / souporcell

Clustering scRNAseq by genotypes
MIT License
168 stars 46 forks source link

cluster_genotypes.vcf misses one cluster #92

Open brianpenghe opened 4 years ago

brianpenghe commented 4 years ago

I ran the whole pipeline using:

singularity exec -B $PWD ~/tools/souporcell/souporcell.sif souporcell_pipeline.py \
    -i possorted_genome_bam.bam -b filtered_feature_bc_matrix/barcodes.tsv.gz \
    -f ~/refseq/refdata-cellranger-GRCh38-3.0.0/fasta/genome.fa -t 15 -o souporcell \
    --common_variants ~/tools/souporcell/jbID.noGenotypes.vcf --skip_remap True -k 6

Indeed I see cluster0 to cluster 5 in clusters.tsv

However, the cluster_genotypes.vcf only has cluster 0 to cluster 4!!

I re-ran consensus.py and had the same result.

Do you know why?

wheaton5 commented 4 years ago

Interesting. Do any cells get assigned to cluster 5? I'll take a quick look at the code to see if there is anything obvious.

wheaton5 commented 4 years ago

Looking at the code, if there were no cells assigned to cluster 5, it would not show up in the cluster_genotypes.vcf. It decides how many clusters there are by looking at the maximum cluster number in the clusters.tsv's cluster assignment field.

brianpenghe commented 4 years ago

The Cluster 5 only has 1 cell there. There are more doublets contributed by Cluster 5 though.