Open ecremelie opened 4 months ago
Wow... it's been a while since I've looked at this code! The upshot here is that these will be cases where the final folded domain isn't made up of contiguous stretches of protein (e.g. where there's a long unstructured loop, or where the chain leaves one domain, folds into another before returning to the first, etc.). The code is ultimately aiming to find groups of residues that AlphaFold believes move as near-rigid bodies. As far as the code doing the clustering is concerned the residue numbers are just unique labels on graph nodes, so they're returned as sets where any residual ordering is more-or-less coincidental. If you want them to be ordered in the output .csv, you could do it by changing https://github.com/tristanic/pae_to_domains/blob/f407c6035c825f151a56f28bf803fcb44321b941/pae_to_domains.py#L135 to:
clusters = [list(sorted(c)) + ['']*(max_len-len(c)) for c in clusters]
Hope that makes sense!
Hi there,
Thank you for sharing this code! I have tried it on several samples, and am a bit confused with the output in the csv:
Thank you in advance!