zhanyinx / CaTCH

3 stars 3 forks source link

Matched selection for pair of RIs for subTAD and TAD. #7

Open X-xiaoyeren opened 2 years ago

X-xiaoyeren commented 2 years ago

Hi, I have read your great manuscript on Genome Research. I'm very curious about the hierarchical TAD, and I got the domains in every RI successfully. Next I pick the RI for subTAD and TAD, give the Level 1 to TADs and the Level 2 to subTADs, and finally merge them together.

Then I find a big problem: the subTADs aren't always totally in TADs, probably resulting from a long range for RI, such as RI > 55% for subTAD and RI > 69% for TAD. So one of the possible solutions may be to find the best pair of RI for subTAD and TAD.

Since it takes a long long time for me to find matched RI for both subTAD and TAD, it would be impossible for large samples. If this step couldn't be solved, it would be a great pity for such a comprehensible and user-friendly method !!!

Thus, could you please give me some suggestions or help me find the solutions?

Any reply will be helpful. Thanks a lot!

zhanyinx commented 2 years ago

Hi,

Thanks for your appreciation of the method. Since the boundaries of domains are not always well defined, at each iteration CaTCH allows small adjustment of boundaries position. This is why "subTADs" (or in general domains at lower RI) are not always within "TADs" (or domains at higher RI).

Regarding your trouble, can you please give me some clarifications? 1) Why do you merge Level 1 and Level 2? (TADs and subTADs). 2) Did you use the Hi-C dataset from the manuscript or do you have your own one? If you use your own one, probably it's better if identifies TADs and sub-TADs using the methods provided in the manuscript: ~180kb size for subTADs, and optimal functional properties for TADs (enrichment of CTCF for instance). If you can use the "optimal functional properties" criteria, you could find TADs based on size (800kb-1Mb for mouse). 3) if I understood correctly, you want to find which subTADs are within which TAD right? if this is the goal, you can probably create GRange objects from your list of TADs and subTADs and use the findOverlaps function to find overlaps. This function will return multiple hits if your subTAD is not within a TAD. In this case, you can use the pintersect function to find the maximum overlapping hit.

lemme know if this helps Best Yinxiu Zhan