swanss / FragFold

MIT License
18 stars 7 forks source link

Modification for custom fragment tiling offset support (more than 1 aa) #12

Open nt314p opened 1 month ago

nt314p commented 1 month ago

Hi,

I am interested in modifying FragFold to support custom tiling offsets for fragment generation. For example, a 3 aa increment when generating fragments could speed up peptide binding prediction considerably with some loss of accuracy. A finer 1 aa pass could then be done around peaks to refine the prediction.

I understand that the file fragfold/src/colabfold_create_msa.py is involved in generating the fragments, which I will modify. Are there other files which would also need modification, such as those involved in the analysis steps?

Thanks!

swanss commented 1 month ago

Hi! That would be a useful feature and if you want to make a pull request I'm happy to review/test it. I believe that colabfold_process_output.py and predict_alphafold_peaks.py will work even with a larger tiling step size (within moderation, the step size needs to be smaller than the fragments, of course). The peak prediction might be less useful, since the parameters were tuned for 1aa step size, but you can look at the plots of weighted contacts by fragment to get a sense of whether there is a region worth scanning at higher resolution.

Another thing to note is that you could get a ~5x speed up by only requesting a single model from colabfold for each fragment. To do this you would just edit the colabfold nextflow process definition by adding the argument --num-models 1.

nt314p commented 1 month ago

I've created a pull request here: #14.

Regarding lowering the number of models, it would be interesting to see the variation between models and whether on average, a single model is "close enough" to optimal. I'll try this approach too, thank you for the suggestion.