pylelab / USalign

Universal Structure Alignment of Monomeric and Complex Structure of Nucleic Acids and Proteins
https://zhanggroup.org/US-align/
Other
109 stars 24 forks source link

execution time with large dataset #25

Closed DS-ribo closed 10 months ago

DS-ribo commented 10 months ago

Hello,

I am trying to use US-align with 1,261 monomeric structures using the following command: path/to/USalign -dir /path/to/pdbs/ list_with_names.txt -mm 4 -fast > result.txt This works with a small dataset but when I try to run it on the full set of 1,261 structures it keeps running > 4 hours and with no results. I am wondering if there is a limit on how many structures can be used or should I just give it more time?

Thank you

kad-ecoli commented 10 months ago

This is not surprising. You are trying to align 1261 of structures into one consensus Multiple Structure Alignment (MSTA). This is guaranteed to be a huge endeavor.

In theory, US-align can handle as much as 32767 protein structures so long as you have enough memory. However, in practice, we rarely run into case where we need to make MSTA for more than a hundred structure. In fact, making MSA for 1261 proteins is not easy even when only sequences are used.

DS-ribo commented 10 months ago

Got it, thank you for the answer. Yes I ended up running it on a smaller dataset and it worked fine, I was just curious about the limit which makes sense.