single-cell-genetics / cellsnp-lite

Efficient genotyping bi-allelic SNPs on single cells
https://cellsnp-lite.readthedocs.io
Apache License 2.0
124 stars 11 forks source link

Generating Single VCF file for Each scRNA-Seq Sample #125

Open hkarakurt8742 opened 1 month ago

hkarakurt8742 commented 1 month ago

Hello, I am relatively new in variant calling using scRNA-Seq. I have 17 datasets from 17 patients. I want to call the variants for each patient. I only need the list of variants in each sample. Can I use cellranger output bam file "possorted_genome_bam.bam" as pseudobulk as suggested in manual:

# 10x scRNA-seq sample in a pseudo-bulk manner cellsnp-lite -s $BAM -O $OUT_DIR -p 10 --minMAF 0.1 --minCOUNT 20 --cellTAG None --UMItag UB --gzip

Thank you in advance

hxj5 commented 1 month ago

Hi, thanks for the question.

Yes, you can call the variants in a pseudobulk manner on the cellranger BAM file. However, it is recommended to subset the BAM file first, to filter the reads from invalid cells with poor sequencing qualities. I quote the manual:

"To genotype 10x scRNA-seq data in a pseudo-bulk manner with cellsnp-lite mode 1b (or mode 2b), it is recommended to subset the BAM file first, by extracting the alignment records with valid cell barcodes only. Here the valid cell barcodes are typically the cell barcodes stored in the cellranger output folder filtered_gene_bc_matrices, which are the cells with high-quality sequencing data."

hkarakurt8742 commented 3 weeks ago

Hi, thanks for the question.

Yes, you can call the variants in a pseudobulk manner on the cellranger BAM file. However, it is recommended to subset the BAM file first, to filter the reads from invalid cells with poor sequencing qualities. I quote the manual:

_"To genotype 10x scRNA-seq data in a pseudo-bulk manner with cellsnp-lite mode 1b (or mode 2b), it is recommended to subset the BAM file first, by extracting the alignment records with valid cell barcodes only. Here the valid cell barcodes are typically the cell barcodes stored in the cellranger output folder filtered_gene_bc_matrices, which are the cells with high-quality sequencing data."_

Thank you for your reply. I will filter the barcodes. I have another question, I want to use a reference fasta (with faidx) with cellsnp-lite, is fastq file enough by itself or a specific version is required? I will use the same fasta that I used as CellRanger reference but because of the "--refseq" option I wanted to be sure.

Thank you

hxj5 commented 3 weeks ago

The FASTA file the same as cellranger reference is good for --refseq option. In general, the genomic build version of the FASTA file should be the same as the BAM file, e.g., both are hg38 or hg19.