nghiavtr / SCmut

GNU General Public License v3.0
28 stars 9 forks source link

scRNA and bulk DNA #3

Open qian0001 opened 4 years ago

qian0001 commented 4 years ago

Hi,

The data I have is just scRNA and bulk WGS. How can I obtain the somatic SNVs from SCmut? I don't have tumor sequences.

This is not cancer data, by the way.

THanks.

nghiavtr commented 4 years ago

Hi,

Thank you for interesting in SCmut and sorry for a late reply.

Yes, you can run fdr2d as long as you have the right input data type, not necessary only cancer data.

Basically, to run the discovery step using function scfdr() we just need 1) the count data of reference and variant alleles of single cells and 2) mut.sites contains a list of candidate mutations, pls see read the SCmut paper and the example in https://github.com/nghiavtr/SCmut#5-cell-level-mutation-detection

Best, Nghia

ysq1770368148 commented 2 years ago

Hi, How can I get " the count data of reference and variant alleles of single cells"? I get the file "output.snp.vcf",but it's not about the single cell. The input data is the bam file about SCRNA-seq.

nghiavtr commented 2 years ago

Hi @ysq1770368148,

If you already have the bam files of all single cells, you can have a look at this section to call all variants: https://github.com/nghiavtr/SCmut#4-variant-calling-of-multiple-files-from-both-rna-seq-and-dna-seq-data

In that section, we show an example of using mplileup for calling variants of all samples including single cells and bulk data where their bam files names are provided in $fileList, and also Rcodes to extract the count data of ref and alt alleles.

The object $fileList contains the file names separated by a space, for example it is set as follows in linux: fileList="singleCell1.bam singleCell2.bam singleCell3.bam BulkNormal.bam BulkTumor.bam"

Best, Nghia

ysq1770368148 commented 2 years ago

Hi Nghia, Thanks! I did the steps that you showed in github. After the Section 4, I got the file "output.snp.vcf". I used your R-codes to extract the count from variant allele (raFull) and the count from reference allele (rrFull) from output.snp.vcf,and I just got one column, but in your example.RData,there are many columns, such as sc1,sc2 ,sc3. So I want to know how you get the single cell variants. Maybe what I got is the total variants? I just used one sample to test.

nghiavtr commented 2 years ago

@ysq1770368148

If you input 1 sample (I suppose the bam file is the data of one single-cell), then you get results in only 1 column. So, the data are the variants of that cell.

The example.RData file contains the data of 33 single cells from the SmartSeq2 scRNA-seq data. In Smart-seq2, each sample (bam file) contains the sequencing data of only one cell. One sample of scRNA-seq data from Droplet-based protocol such as 10xGenomics containing multiple single cells can not be applied directly here.

ysq1770368148 commented 2 years ago

Thank you for your patient answer, it helped me a lot!

Best wishes.

nghiavtr commented 2 years ago

@ysq1770368148 Thank you for using scMut. Good lucks with your research!

Nghia