kokonech / InTAD

Correlation of epigenetic signals and genes in TADs/via loops
4 stars 2 forks source link

Integrating PROseq data for Enhancers instead of ChIPseq #10

Open MathewBerg9 opened 1 week ago

MathewBerg9 commented 1 week ago

Hi, I'm trying to use PROseq data which is basically experession data instead of using ChIPseq data, but the problem I'm running into is I don't understand what kind of Normalization you used in ChIPseq data, as the Enhancer data in your table has both positive and negative values, so what kind of normalization was done for the Enhancer signals? If I have PROseq expression data, can I use FKPM values for both genes and enhancers? will the algorithm still work that way? or does it require the values of the enhancers to be in different scale compared to the genes?

kokonech commented 6 days ago

Hi, in the test data normalization was Z-scored, therefore there are negative values. In our other target test dataset (ependymoma tumors) general normalized values were used, more details here: https://link.springer.com/article/10.1186/s12859-019-2655-2

So, FPKM fits well, log2 adjustment could be useful to avoid strong variance. The main focus is correlation, so the idea is simply to have similar distribution across signals in comparison.

MathewBerg9 commented 4 days ago

Hi Thank you very much for your response, that sorted my question out, the program seems to work as intended.

I have one more question regarding the p-values:

In the article you use the q-values (adjusted p-values) for determining corelations. You also had numerous samples to go with for statistical analysis. I was wondering how the base p-values are generated in program? is it based on some sort of bootstrapping? I ask this because I only have 8 replicates for my analysis, and this poses the issue of not very signficant q-values for many corelations, though that could be the case, I was just wondering, if there were to be an inherent bootstrapping within the program, I could maybe increase the iterations and use the p-value directly instead of the q-value, for determining significant interactions.

kokonech commented 3 days ago

Sure, the base p-values are simply correlation p-values, check cor() R function for details. Adjusted p-values were inlcuded only as additional filtering control, they are computed from qval package (https://www.bioconductor.org/packages/release/bioc/html/qvalue.html). So in case of a small dataset/low distribution standard p-val should be fully sufficient.