kharchenkolab / numbat

Haplotype-aware CNV analysis from single-cell RNA-seq
https://kharchenkolab.github.io/numbat/
Other
158 stars 22 forks source link

Co-genotype with bulk WGS and scRNA-seq #106

Closed DrFengchenzhao closed 1 year ago

DrFengchenzhao commented 1 year ago

Hi Teng Gao,

I have implemented your methods on my scRNA-seq data which showed great advantages over others. I just noticed that you mentioned 'Therefore, a priori genotyping by DNA or co-genotyping with scRNA-seq (via multi-sample mode of pileup-and-phase) would be especially useful.' on suggestions of preparing spatial transcriptomics data.

We sequenced bulk WGS, scRNA-seq and ST on the same sample. I wondered how I can harmonize these omics, especially using SNPs genotyped by bulk WGS to further increase power to detect CNVs in scRNA-seq.

Also, does 'via multi-sample mode of pileup-and-phase' mean that I should run pileup-and-phase.R with scRNA-seq and ST together?

Thank you for your patience. Looking forward to your reply.

Dr. Chenzhao Feng

teng-gao commented 1 year ago

Hi @DrFengchenzhao ,

Thanks for the issue.

We sequenced bulk WGS, scRNA-seq and ST on the same sample. I wondered how I can harmonize these omics, especially using SNPs genotyped by bulk WGS to further increase power to detect CNVs in scRNA-seq.

Yes, this is possible. You can first run allele pileup using cellsnp-lite with the WGS heterozygous SNP VCF. Then you run eagle2 phasing on the WGS VCF, and merge the phased GT fields with the allele counts to produce an allele dataframe in the format of Numbat input. Please refer to the underlying code in pileup_and_phase.R script to see how to modify the procedure to do this.

Also, does 'via multi-sample mode of pileup-and-phase' mean that I should run pileup-and-phase.R with scRNA-seq and ST together?

Yes, you can do that. This will help increase allele coverage for co-genotyping.

teng-gao commented 1 year ago

Instructions are now added in the getting started vignette.

cnk113 commented 1 year ago

Hey Teng,

I'm a bit lost after running cellsnp on the het regions, how do proceed after that? Would I attempt to the df_allele matrix from the cellsnp output?

Thanks, Chang

teng-gao commented 1 year ago

Hi @cnk113 ,

You can probably directly use the preprocess_allele function to get df_allele from the cell-snp counts and WGS phased VCF:

https://github.com/kharchenkolab/numbat/blob/dc5c3fe3046b9314d71cfaadf1ba429ed1ba106e/R/genotyping.R#L141-L293