ncbi / AF2_benchmark

12 stars 3 forks source link

MSA for some sequences not found #1

Closed PepperLee-sm closed 1 month ago

PepperLee-sm commented 2 months ago

The dataset mentioned in the article contains a total of 93 pairs of proteins.

In theory, there are 93 * 2=186 sequences that need to be searched for MSA.

However, only 171 sets of MSA are provided in the directory AF2benchmark/AFcluster-MSAs/.

May I ask if the remaining MSAs can be provided?

porterll commented 2 months ago

Not all fold switchers require two MSAs because their sequences are 100% identical with no insertions, deletions, or mutations. This is why there are fewer than 186-- all MSAs generated for this work are in the repo.

PepperLee-sm commented 2 months ago

Not all fold switchers require two MSAs because their sequences are 100% identical with no insertions, deletions, or mutations. This is why there are fewer than 186-- all MSAs generated for this work are in the repo.

Get it! Thanks for your reply! 😊

PepperLee-sm commented 2 months ago

But some fold switchers do not have MSA for both sequences. The list of fold switchers missing MSA is as follows. 2c1v_B, 2c1u_C 2oug_C, 6c6s_D 3g0h_A, 3ews_B 3kds_G, 2ce7_C 3uyi_A, 3v0t_A 4aan_A, 4aal_A 5jyt_A, 2qke_E

porterll commented 2 months ago

Correct. The sequences of 2oug_C and 6c6s_D, for example, are 100% identical. Thus, have the same MSA. So you run AF-cluster on the one MSA and see if it produces both structures.

PepperLee-sm commented 2 months ago

Sorry, what I mean is that in the directory AF2benchmark/AFcluster-MSAs/, these fold switches mentioned above don't even have a set of MSA. For example, no MSA has been provided for either 2oug or 6c6s.

porterll commented 2 months ago

All uploads are done, here are the details:

2c1v_B, 2c1u_C 3g0h_A, 3ews_B 3kds_G, 2ce7_C 3uyi_A, 3v0t_A 4aan_A, 4aal_A

these were already in the repo

2oug_C, 6c6s_D 5jyt_A, 2qke_E

for RfaH only 2oug_C (not 6c6s_D) sequence was used for generating MSAs, since the sequences were identical.

PepperLee-sm commented 2 months ago

I cannot uncompress 3kds_G.tar.gz successfully and the size of it is only 2 bytes. Could you please check if it's compressed correctly?

porterll commented 2 months ago

The issue should be resolved now. We replaced the previous file with an uncorrupted one of >9 MB.