stuart-lab / signac

R toolkit for the analysis of single-cell chromatin data
https://stuartlab.org/signac/
Other
329 stars 88 forks source link

I'm running a scATAC-seq analysis. However, the peaks I've called are somehow weird #1804

Closed TideJunior closed 1 month ago

TideJunior commented 1 month ago

I'm running a scATAC-seq analysis. The data was obtained from the published work: Single-Cell Chromatin Accessibility Analysis Reveals the Epigenetic Basis and Signature Transcription Factors for the Molecular Subtypes of Colorectal Cancers; https://ngdc.cncb.ac.cn/omix/; OMIX005759; OMIX005759-04.sort.fragments.tsv.gz. However, when I run the following code, I call these peaks:

peaks2 <- CallPeaks(frags2,macs2.path="/home/anaconda3/envs/lowpython/bin/macs2",combine.peaks=F)
peaks2@ranges
#IRanges object with 24 ranges and 0 metadata columns:
#           start       end     width
#       <integer> <integer> <integer>
#   [1]      9968 248946051 248936084
#   [2]     10191 133787023 133776833
#   [3]    146452 135076001 134929550
#   [4]     15323 133236135 133220813
#   [5]  18172397 114353789  96181393
#   ...       ...       ...       ...
#  [20]     19503 159335730 159316228
#  [21]     82336 145075845 144993510
#  [22]     10409 138233182 138222774
#  [23]   2781347 156030667 153249321
#  [24]    268144  57217121  56948978

They are obviously too wide to be "peaks"! My complete code is as follows:

## PATH
fragpath2 <- "/data/Tide/rawdata/TangLab-scATAC-seq/OMIX005759-04.sort.fragments.tsv.gz"
## require essential informations
total_counts <- CountFragments(fragpath)
cutoff <- 1000 # Change this number depending on your dataset!
barcodes <- total_counts[total_counts$frequency_count > cutoff, ]$CB
## create .tbi index
gunzip rawdata/TangLab-scATAC-seq/OMIX005759-04.sort.fragments.tsv.gz
bgzip rawdata/TangLab-scATAC-seq/OMIX005759-04.sort.fragments.tsv
tabix -p bed rawdata/TangLab-scATAC-seq/OMIX005759-04.sort.fragments.tsv.gz
## Create fragment object
frags2 <- CreateFragmentObject(path = fragpath2, cells = barcodes)
## peak calling
peaks2 <- CallPeaks(frags2,macs2.path="/home/anaconda3/envs/lowpython/bin/macs2")

The version of RStudio: 4.3.1 The version of Signac: 1.14.0 The version of MACS2: 2.1.4 How could I fix this problem?

timoast commented 1 month ago

It seems like this could be an issue with the dataset. I'd suggest viewing the coverage in a genome browser and see how it looks, and you can try tweaking the MACS2 parameters