Conditions for conservation analysis of syntenic blocks

vaquerizaslab / chess

Comparison of Hi-C Experiments using Structural Similarity.

Other

26 stars 6 forks source link

Conditions for conservation analysis of syntenic blocks #45

Open luciaalvarez95 opened 3 years ago

luciaalvarez95 commented 3 years ago

Hi!

I am analyzing the level of chromatin conformation conservation between different species, and chess looks like the perfect tool for me! I already have the synteny blocks between the species I am working with, however I am not sure on what window size or step should I use. First, I would like to replicate your paper's analysis but I don't know which conditions were used, could you help me with this? Do you have any suggestion on what parameters should I take into account when running this type of analysis? Is chess sensible enough to address TAD variation in interspecies analysis?

Many thanks

Lucía

liz-is commented 3 years ago

Hi Lucía,

The way the comparison across species was carried out in the paper was by using the syntenic regions in bedpe format as the input pairs file for chess sim. If you have syntenic regions, you don't need to choose a window size or step to generate a pairs file - this is primarily used for comparing different conditions with the same genome.

Using the syntenic regions as the region pairs will compare the whole syntenic region in one species to the other species, so if your syntenic regions are very large I suppose you might want to create smaller regions that tile across them to perform higher-resolution analysis. I haven't used CHESS for cross-species comparison myself, but I believe @nickmachnik did that part of the analysis for the paper, so he may be able to help if you need more input.

luciaalvarez95 commented 3 years ago

Hi,

Many thanks for your helpful reply, I wasn't aware that I can use the syntenic regions as a pairs file, I will definitely try that. However, as you have pointed out, the syntenic regions are quite large and I would like to perform a higher-resolution analysis, so any more help would be welcome.

Thanks again,

Lucía

liz-is commented 3 years ago

Since the syntenic regions likely have different sizes in the different species and the change in size may not be even across the region, I would probably start by taking the syntenic regions in one species as a reference, splitting these into sub-regions (by tiling across them), and then lifting over these sub-regions to the other genome to find the matching syntenic sub-region to use for constructing the pairs file. You would need to write custom code to create these sub-regions, there isn't anything built-in to CHESS for this.

Note that the size of the regions you use for CHESS analysis needs to be at least 20x the resolution of your Hi-C matrix, and personally I usually have better results using 100x the resolution. That is, for a 5 kb resolution Hi-C matrix, 20x would be 100 kb regions and 100x would be 500 kb regions. So bear this in mind when deciding what size to use for your sub-regions.

zy041225 commented 2 years ago

Hi both,

I'm also interested in performing a cross-species comparison with chess. I wonder if the whole syntenic blocks can be used in chess, or I should splice each of the blocks into sliding-windowing similar as produced by chess pair with some specific window size and step size. Besides, if the size of syntenic blocks differs quite a lot (e.g. 5 to 10-fold), would that affect the chess result?

Many thanks

Yang

nickmachnik commented 2 years ago

Hi Yang,

Generally, you should be able to use whole syntenic blocks without splitting. We recommend matrix sizes of at least 100x100 pixels for meaningful comparisons, so if some of your smallest syntenic regions are below that size you might want to discard them or try a smaller bin size if your sequencing depth allows.