single-cell-genetics / vireo

Demultiplexing pooled scRNA-seq data with or without genotype reference
https://vireoSNP.readthedocs.io
Apache License 2.0
73 stars 27 forks source link

Vireo without Genotype (N=3) - high unassigned cells #39

Open Parul-Kudtarkar opened 2 years ago

Parul-Kudtarkar commented 2 years ago

Hi @huangyh09,

Thank you for creating vireo & cellsnp-lite - it is my to go software for demultiplexing!!

We have multiomic sequencing (shallow)

  1. for scRNA 3320 mean raw reads per cell
  2. for scATAC 980 raw read pairs per cell

I ran cell-snp lite with 7.4M SNPs ref list & vireo (mode 1)

  1. I get high number of unassigned cells for scRNA

vireo -c cellsnp-out-PM_002 -o /home/ubuntu/vireo-PM_002 -N 3 --randSeed 2

vireo] Loading cell folder ...
[vireo] Demultiplex 3098 cells to 3 donors with 287 variants.
[vireo] lower bound ranges [-7783.8, -7625.6, -7001.1]
[vireo] allelic rate mean and concentrations:
[[0.009 0.423 0.994]]
[[4110. 6858.3 1473.6]]
[vireo] donor size before removing doublets:
donor0  donor1  donor2
1012    934 1152
[vireo] final donor size:
donor0  donor1  donor2  unassigned
32  16  62  2988
[vireo] All done: 0 min 4.7 sec
  1. For scATAC
    [vireo] Loading cell folder ...
    [vireo] Demultiplex 3098 cells to 3 donors with 118 variants.
    [vireo] lower bound ranges [-1936.4, -1904.9, -1870.9]
    [vireo] allelic rate mean and concentrations:
    [[0.001 0.439 0.998]]
    [[ 571.2 1994.6  248.1]]
    [vireo] donor size before removing doublets:
    donor0  donor1  donor2
    1035    1029    1034
    [vireo] final donor size:
    unassigned
    3098
    [vireo] All done: 0 min 1.5 sec

    I used - 36.6M SNPs with minor allele frequency (MAF) > 0.0005 Any other recommendations?

Best, Parul

huangyh09 commented 2 years ago

Hi Parul,

Thanks for the issue. The coverage indeed looks low for both scRNA and scATA modules. I wonder a quick thing you can try is combining the two modules together so that more SNPs are available for each cell. Vireo doesn't support this directly, but you can use bcftools concat if you have *cells.vcf.gz (by using --genotype in cellsnp-lite). Alternatively, you may try combining the sparse matrices directly.

The parameters seem already lenient, especially for scATAC module. Potentially, you can consider adding mitochondrial SNPs, though we haven't tried it for demultiplexing donors, but we did observe that they are informative even for clustering somatic clones (see MQuad).

If none of them works, then it probably has to increase the sequencing coverage, which may be helpful for the downstream analysis too.

Best Yuanhua

racng commented 2 years ago

I have tried combining the BAM files from scATAC and scRNA before calling the SNPs using vartrix.