jianhong / ATACseqQC

ATAC-seq Quality Control
https://jianhong.github.io/ATACseqQC/articles/ATACseqQC.html
22 stars 12 forks source link

Interpreting the outputs of the function TSSEscore() comparing to the reference from ENCODE #54

Open SergioRodLla opened 1 year ago

SergioRodLla commented 1 year ago

Hello,

First of all, thank you for developing ATACseqQC.

I am having troubles interpreting the outputs of the function TSSEscore(). I am using the TSSE score definition used in ENCODE standards and then compare the score to the reference values used in this table in the Current Standards section form the ENCODE ATAC-seq guideline.

Fist of all I noticed the way they compute TSSE score is different than the one that is explained in the TSSEscore() documentation:

I want to compare to the reference values indicated in the TSSE table from ENCODE. To do so I used the function as the following:

txs <- transcripts(TxDb.Mmusculus.UCSC.mm10.knownGene)
tsse <- TSSEscore(gal1, txs, upstream = 2000, downstream = 2000)
tsse$TSSEscore

I got this output:

> tsse$TSSEscore
[1] 11.21616

Is this the right way to go? Also, are the values in the table from ENCODE referring to this number?

Thank you, Sergio

jianhong commented 1 year ago

Hi SergioRodLla, Even you changed the upstream and downstream value, I don't think current calculation results will meet the ENCODE results. You may also want to change the width parameter to 10 to follow the current setting of ENCODE. The continues change of the parameter make me also confused what to follow. In my opinion, the TSSEscore should not be a cutoff QC for all different species. This value is affected by multiple thing. As you can imaging, the TSS positions, the range of signals, the signal bin sizes, and the smoothy methods, etc. For different species with different annotation files, this value will be different. In a summary, this value can be treat as an index of enrichment in TSS region, but it should not be a cutoff value for QC in different species. Hope this will help.

SergioRodLla commented 1 year ago

Hi @jianhong , thank you for the answer.

I am having problems to interpret how they compute the TSSEscore according to the ENCODE standards. Because they don't mention the width of the window they use. I assume it is 100 bp, the same that TSSEscore() uses as default. In any case would this change the final value by much?

When you say: "For different species with different annotation files, this value will be different", I think this is why they have done this table I was telling you in my last post, where they have put reference values for the latest human and mouse assemblies. This one:

image

As you can see, they state that these values are used as cutoff for high quality data. I know this may not be the absolute truth to tell if the quality of my ATAC-seq is good enough but I was hoping it helps to get an idea if the data are good for downstream analyses.

Thank you for your help. Have a happy holidays!

Sergio

jianhong commented 1 year ago

The bin size mentioned here: https://github.com/ENCODE-DCC/atac-seq-pipeline/issues/50#issuecomment-490999002

SergioRodLla commented 1 year ago

Happy new year! Thank for the information.

I just want to mention the following:

I have been looking at the code of the function and although I do not understand it completely, I believe that when you take into account the endSize parameter the code does not take into account what they say the comment you mentioned in this part: We take 10 bins from each end (100bp from the left end, 100bp from the right end), and we average those values in the bins to get an average background noise value. I may be wrong but the code of TSSEscore() does not take 10 bins of 10 bp at each end and make the average and so if the endSize is kept to be 100 and the width to 10, due to this calculation for the average background noise value, this gives a very high enrichment score.

May the TSSEscore() implementation take into account for this?

jianhong commented 1 year ago

Yes, you are correct. when it set to 10bp, then both end will also set to 10bp. That is a bug. I may need to rewrote the function. I put it to my TODO list to fix this issue.