wheaton5 / souporcell

Clustering scRNAseq by genotypes
MIT License
157 stars 45 forks source link

Can souporcell be used for single cell ATAC file? #83

Open RyanYip-Kat opened 3 years ago

eandresleon commented 3 years ago

I was wondering same thing

wheaton5 commented 3 years ago

With 2-4 donors some people have had success. The problem generally is that there is far fewer UMI/cell and therefore less data on which to cluster. So if you are designing an experiment, pool a few individuals and sequence deeply.

eandresleon commented 3 years ago

Haynes, thanks for the quick response. We are thinking of doing scRNA + ATAC of the same individuals so any advice is welcome.

wheaton5 commented 3 years ago

Im not sure how helpful this is. You can use the final vcf from the scRNAseq experiment as known genotypes for the scatacseq datasets. But beyond that im not sure. Best of luck, Haynes

On Tue, Feb 16, 2021 at 2:42 AM eandresleon notifications@github.com wrote:

Haynes, thanks for the quick response. We are thinking of doing scRNA + ATAC of the same individuals so any advice is welcome.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/wheaton5/souporcell/issues/83#issuecomment-779679188, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABEWNJQN3OC4R34NCOHRRTLS7IVZDANCNFSM4R66CVUA .

--

-Haynes Heaton

rdf1993 commented 1 year ago

I did scRNA + ATAC of the same individuals,is there any way to combine the RNA and ATAC data to split the cells from different sample. Now i had used rna and atac data respectively for souporcell,but some cells are assigned inconsistently when using rna and atac as below:

image
wheaton5 commented 1 year ago

Not sure exactly what im looking at here. Can you fully describe the experiment(s) and why it is inconsistent? Labeling of clusters is arbitrary so it looks to me like cluster 0 in one experiment corresponds to cluster 1 in the other. But again i dont know what im looking at so maybe im off.

wheaton5 commented 1 year ago

But i think you could combine the data and run with souporcell. There might be some details on how to combine them, but seems possible.

rdf1993 commented 1 year ago

rna and atac is did in the same cell,so theoretically the result should be 100% consistent,but now most cells were assigned consistently as you mentioned cluster 0 in one experiment corresponds to cluster 1 in the other, but there is still some cells are inconsistent. I also want to combine the data, but the problem is that RNA and ATAC data had different format(with or without UMI),so i do not know how to solve the problem. Can I combine the data in which of intermediate steps in the souporcell pipline

wheaton5 commented 1 year ago

I still dont know what the exact experiments were. You can combine data and either give the no umi option or to give unique umis for the atacseq data

rdf1993 commented 1 year ago

thanks, I will try to give unique umis for the atac seq data.

ktpolanski commented 1 year ago

Some nice discussion in here, it's nice to know that "shoehorn the two multiome BAMs together and run with --no-umi True" is an out of the box option.

When mucking around with fake UMIs for ATAC, I focused on vartrix. Vartrix is the only thing in souporcell's pipeline that actually cares about the UMIs, right? Of note, at least in my test case, the ATAC seemed to be contributing a lot more counts than the GEX. The ref/alt matrices both had about 20x more total counts in the ATAC, and about twice as many lines. Interesting.

Summary of below: Name your ATAC "UMIs" anything unique that doesn't overlap with 10X's actual UMIs that you'll find in your GEX and you should be fine.

I wasn't sure how to interpret "unique UMIs for ATAC", mostly concerned with how vartrix would handle something that wasn't the stock UMI from 10X. I know I've snuck modded CBs through the souporcell workflow before, but would vartrix somehow try to convert the UB content based on expectation? As such, I did a test to contribute to the discussion here. I took a multiome sample, subset both the GEX and ATAC to chr22, and created fake unique UMIs in the ATAC that would guarantee avoiding overlap with proper 10X UMIs on the GEX:

import pysam
import sys

bampath = "atac_chr22.bam"
outbam = "atac_chr22_byeah.bam"

bamfile = pysam.AlignmentFile(bampath, "rb")
tweaked = pysam.AlignmentFile(outbam, "wb", template=bamfile)
for i, read in enumerate(bamfile.fetch()):
    if read.has_tag('CB'): 
        read.set_tag('UB',"BYEAH"+str(i))
    tweaked.write(read)

tweaked.close()

I ran the resulting file through vartrix (1.1.14, which is in the souporcell singularity container I've got on hand), with syntax matching that of what souporcell would do plus extra logging to check internal parsing:

vartrix --mapq 30 -b atac_chr22_byeah.bam -c barcodes.tsv --scoring-method coverage --threads 23 --ref-matrix byeah_ref.mtx --out-matrix byeah_alt.mtx -v chr22.vcf --fasta ~/cellranger/GRCh38-2020-A/fasta/genome.fa --umi --log-level debug 1>byeah_stdout.txt 2>byeah_stderr.txt

This resulted in the following style of entry in the debug log:

10:51:23 [DEBUG] vartrix: cell_index 604 / UMI BYEAH18440790 saw counts ref: 1 alt: 0 unk: 0

The UMI is not being parsed in any way, vartrix just takes it at face value. I then merged the BAMs for the GEX and the ATAC, ran that through vartrix, and got a perfect sum of the individual modalities' ref/alt counts on output - the fake ATAC UMIs seem to be working and not clashing with the GEX.

Of note, I encountered some weirdness when trying the newer 1.1.22 vartrix. A small number of counts go missing between the individual runs and the merged BAM run. While trying to follow some cases, I was able to find a cell-UMI combination (and corresponding read name) that would get parsed as a ref count fine in the GEX, but would not get parsed at all in the merged somehow. However, this led to the given variant not having any counts in the merged run. So no wonkiness with the UMI, something else being weird.

yangchao4 commented 1 year ago

I did scRNA + ATAC of the same individuals,is there any way to combine the RNA and ATAC data to split the cells from different sample. Now i had used rna and atac data respectively for souporcell,but some cells are assigned inconsistently when using rna and atac as below: image

Hello! I have a sample that combines ATAC-seq and snRNA-seq (in the same nucleus). Do you have any solutions to this?

wheaton5 commented 1 year ago

Not currently but it looks like cluster 0 in one experiment is cluster 1 in the other.

XFWuCN commented 4 months ago

Hello! I am doing ATAC,and I find missing about 200 cells in the clusters.tsv file, could you please tell me why? I add a parameter "--no_umi True" to run, the code as follow:

singularity exec souporcell.sif souporcell_pipeline.py \
--no_umi True \
-i merge_sorted.bam \
-b merge_barcodes.tsv \
-f genome.fa \
-t 16 \
-o ATAC-seq_out \
-k 3

the ATAC-seq data have three samples, 24354 cells, but I got 24096 cells in the clusters.tsv file.

ktpolanski commented 4 months ago

If merging multiple 10X samples, there's a good chance of unrelated barcode overlap between them. Did you remember to prepend a sample ID to both the CB: tags of the BAM and the listed barcodes in the TSV?

XFWuCN commented 3 months ago

If merging multiple 10X samples, there's a good chance of unrelated barcode overlap between them. Did you remember to prepend a sample ID to both the CB: tags of the BAM and the listed barcodes in the TSV?

Yes, you are right! I made it, thank you.