jianhong / ATACseqQC

ATAC-seq Quality Control
https://jianhong.github.io/ATACseqQC/articles/ATACseqQC.html
23 stars 12 forks source link

question on the rationality of spliting the bam files #62

Closed sunta3iouxos closed 4 weeks ago

sunta3iouxos commented 1 month ago

Hi there, The default option in splitGAlignmentsByCut() for splitting is defined by:

breaks = c(0, 100, 180, 247, 315, 473, 558, 615, Inf),
labels = c("NucleosomeFree", "inter1", "mononucleosome", "inter2", "dinucleosome",
    "inter3", "trinucleosome", "others"),

what is the rational of having these inter areas? especialy between the mono-di-tri nucleotides. is there a biological significance, for these cut-offs? I would split the bam file based on the fragment length coverage as nucleosome free (0-150), mononucleosome (150-peak220-290) dinucleosome (314 -peak384 -454), and this is based on the identified peaks and removing/adding 70 nucleotides from the peak that correspond to the nucleosome size. regarding the nucleosome free, I would accept it is quite flux. But for betwwen the nucleosomes especially when we talking of a succession of mono-di-tri-etc and we know that 147 bases are wrapt around nucleosomes. Also what to define as nucleosome free is a bit arbitary I am not sure what to expect there.

jianhong commented 1 month ago

It is totally depend on you. Those setting are just trying to repeat the GreenLeaf Lab plots as they described in section Nucleosome positioning of their paper https://www.nature.com/articles/nmeth.2688.