Closed PepperLee-sm closed 1 month ago
Not all fold switchers require two MSAs because their sequences are 100% identical with no insertions, deletions, or mutations. This is why there are fewer than 186-- all MSAs generated for this work are in the repo.
Not all fold switchers require two MSAs because their sequences are 100% identical with no insertions, deletions, or mutations. This is why there are fewer than 186-- all MSAs generated for this work are in the repo.
Get it! Thanks for your reply! 😊
But some fold switchers do not have MSA for both sequences. The list of fold switchers missing MSA is as follows. 2c1v_B, 2c1u_C 2oug_C, 6c6s_D 3g0h_A, 3ews_B 3kds_G, 2ce7_C 3uyi_A, 3v0t_A 4aan_A, 4aal_A 5jyt_A, 2qke_E
Correct. The sequences of 2oug_C and 6c6s_D, for example, are 100% identical. Thus, have the same MSA. So you run AF-cluster on the one MSA and see if it produces both structures.
Sorry, what I mean is that in the directory AF2benchmark/AFcluster-MSAs/, these fold switches mentioned above don't even have a set of MSA. For example, no MSA has been provided for either 2oug or 6c6s.
All uploads are done, here are the details:
2c1v_B, 2c1u_C 3g0h_A, 3ews_B 3kds_G, 2ce7_C 3uyi_A, 3v0t_A 4aan_A, 4aal_A
these were already in the repo
2oug_C, 6c6s_D 5jyt_A, 2qke_E
for RfaH only 2oug_C (not 6c6s_D) sequence was used for generating MSAs, since the sequences were identical.
I cannot uncompress 3kds_G.tar.gz successfully and the size of it is only 2 bytes. Could you please check if it's compressed correctly?
The issue should be resolved now. We replaced the previous file with an uncorrupted one of >9 MB.
The dataset mentioned in the article contains a total of 93 pairs of proteins.
In theory, there are 93 * 2=186 sequences that need to be searched for MSA.
However, only 171 sets of MSA are provided in the directory AF2benchmark/AFcluster-MSAs/.
May I ask if the remaining MSAs can be provided?