loosolab / TOBIAS

Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal
MIT License
191 stars 41 forks source link

Questions about dealing with snATAC-seq data with multipel samples. #205

Closed dmsalsgh97 closed 1 year ago

dmsalsgh97 commented 1 year ago

Hi, Thanks for developing this wonderful tool!

I wanna ask about dealing with snATAC_seq data with multiple samples. I'm trying to follow the workflow in this. https://github.com/loosolab/TOBIAS/issues/137

I have a conditional snATAC-seq with 10 samples. (6 normal, 4 dis), and I've clustered cells using 3rd party program (ArchR). Now, If I want to do TOBIAS analysis in a condition-specific manner, should I make separate .bam files per sample and merge them?

For example, If I have sample1, sample2 and cell-type A, B, Then make an input .bam file by merging sample1_celltypeA.bam and sample2_celltypeA.bam?

Thanks! Minho

msbentsen commented 1 year ago

Hi @dmsalsgh97

Thank you for your question - yes, I would do exactly like you suggest, and make:

sample1_celltypeA.bam + sample2_celltypeA.bam -> merged_celltypeA.bam
sample1_celltypeB.bam + sample2_celltypeB.bam -> merged_celltypeB.bam
(etc)

And use merged_celltypeA.bam and merged_celltypeB.bam as input for TOBIAS.

If the cells in each cluster have roughly the same number of reads, it should be fine to merge it. If the clusters are small (<100 cells) and one cell has a lot of reads compared to all other cells (this might be part of your QC in ArchR), you just have to keep in mind that this cell might dominate the TOBIAS analysis. However, if you have enough cells per cluster, this effect should be negligible.

I can't say what the minimum number of cells needed is, so that requires a little bit trial-and-error. I hope it works out!

dmsalsgh97 commented 1 year ago

Thanks for your comments!