vaquerizaslab / chess

Comparison of Hi-C Experiments using Structural Similarity.
Other
26 stars 6 forks source link

Chess sim output .tsv file explained #61

Open aminakur opened 1 year ago

aminakur commented 1 year ago

Could you please provide information about the z_bg and p_bg columns in the chess sim output file?

image
liz-is commented 1 year ago

When you ran chess sim, you must have specified either --background-regions or --background-query. Using these options means that CHESS will calculate a z-score and corresponding p-value for the significance of the similarity of each pair of regions, compared to comparing your reference region to the background regions. This is described in more detail in the CHESS paper:

An additional application of CHESS is to assess whether contact matrices originating from different genomic regions, or different genomes, are similar. An appropriate null model can be used to test whether the similarity measured by S is statistically significant. For example, a region R containing just a single TAD might obtain a high score when compared to a particular query Q, which also contains a single TAD. However, the same is true for any region with a single TAD, which is why the similarity of R and Q is not particularly informative in this instance. The score for the comparison of R versus Q should then be assigned a low significance. Conversely, when two highly complex regions with many structural features are assigned a strong similarity score S, it is unlikely to find an equally similar region in the genome by chance and the comparison is given high statistical significance. To compute a suitable null model, CHESS compares the reference matrix R to all other regions of the same size across the genome (referred to as 𝑄𝐵𝑖 in Fig. 1b). The distribution of scores from the null model is then used to calculate a z-score, corresponding to a normalized effect size, and a P value denoting the frequency of scores equal to or higher than S in the null model (Fig. 1c and Methods). Therefore, CHESS enables a quantitative comparison and assessment of statistical significance of contact matrix similarities.