patrickbryant1 / AFProfile

Improved protein complex prediction with AlphaFold-multimer by denoising the MSA profile
Creative Commons Attribution 4.0 International
62 stars 0 forks source link

slow MSA generation speed #2

Closed cclark1e closed 1 year ago

cclark1e commented 1 year ago

Hi, thanks a lot for making this. I've got it running and it seems to work well but I'm having some issues applying it on a larger scale (~500 models) on a HPC.

When I run standard AFM the MSA generation stage seems to take less than an hour using the full databases. But here, the full database MSA generation is running for over 30 hours on batch jobs, one for each model. Each job has a single core but this is puzzling me as I understand the MSA generation to not be parallelizable by GPUs or multiple CPUs.

Aside from changing the database to be full I haven't changed any of the settings. The complexes I'm running the models on are about 130-200 residues (including both chains).

Any help or guidance would be very appreciated, thank you.

patrickbryant1 commented 1 year ago

Hi,

You are welcome!

There is no difference in making the MSAs here vs with standard AFM (it is the same script). The MSA generation runs in parallel on CPU (more cores = faster) so this would explain the difference in runtime that you observe. If you want to make this step faster, specify something like 8 cores. Another tip is to put the databases you search on a faster disk (like SSD) and not a slow one (like HDD) . This will have a significant impact on the speed as a lot of data has to be read during the search.

I hope this makes things more clear.

cclark1e commented 1 year ago

Ah - my misconception then - thanks for clearing that up!