wbaopaul / scATAC-pro

A comprehensive tool for processing, analyzing and visulizing single cell chromatin accessibility sequencing data
MIT License
70 stars 24 forks source link

Differential accessibility between cases and controls #31

Closed AnjaliC4 closed 3 years ago

AnjaliC4 commented 3 years ago

Hi, I have 5 cases and 5 controls, I would like to integrate cases and controls separately, normalize by their library size, and perform DA of peaks. I am not sure how to properly use scATAC-pro workflow to do so, but I know its possible because you do provide the options to do it. I know I could use the scATAC-pro integrate function to merge all the peaks of cases and controls separately, but from there how do I normalize them and perform differential accessibility of peaks between cases and controls? Thanks a lot!

wbaopaul commented 3 years ago

Currently, there is no module to coduct DA of peaks using integrated object. I would suggest integrate all those 10 samples together (not separately), which will give you an integrated seurat object, and the tf-idf normalized data was saved in the seurat object. You can then conduct DA analysis using seurat FindMarker function. If you integrate cases and controls separately, I think the two sets of merged peaks are different, it's no easy way to do DA by this. Let me know if you have any further questions.

Hope it helps.

AnjaliC4 commented 3 years ago

Ok thanks so much! I will try that. So just to clarify does the integrate function in scATAC-pro only find intersecting peaks (similar to reduce in seurat) and does not correct for differences in sequencing depth/library size when integrating these subjects?

wbaopaul commented 3 years ago

Acutally the integrate module in scATAC-pro first merges the peaks (union not intersecting) and re-constructs matrix for each sample use the merged peaks. Then, all the matrices are concatenated (cbind) and normalized by tf-idf by default (which correct the sequence depth/library size as well). The normalized data was saved in the seurat object@arrays$ATAC@data. Finally, data from different samples are integrated by one of following options: Integrate_By = pool, VFACS (by default), seurat (which used cca), or harmony. You can set up the option in your configure_user.txt file.

wbaopaul commented 3 years ago

I will close this issue since it's inactive for more than two months.