Open Yoshanuikabundi opened 1 year ago
Hi @Yoshanuikabundi , as suggested during the follow-up workshop, I loosened the convergence criteria for ForceBalance to check if I get any speed improvement without loss of accuracy.
By changing both convergence objective
and convergence step
from 0.01
to 1.00
, the speed improved from ~40 hrs to ~19 hrs without much loss in accuracy as from the K values and the resultant torsion plots.
I am sure even this can be further improved in terms of speed without issues and suspect something is taking much longer than it has to slowing down the entire FB fitting process. I see this kind of a behaviour with a few other unrelated ligand molecules too.
Could you please suggest if there is a better way to debug and narrow down to what's causing the issue, and also possibly what other parameters can be tuned to avoid this behaviour.
Below is my example input file passed to ForceBalance.
$options ffdir forcefield penalty_type L1 jobtype optimize forcefield force-field.offxml
maxstep 10 convergence_step 1.00 convergence_objective 1.00 convergence_gradient 0.01 criteria 2 eig_lowerbound 0.01 finite_difference_h 0.01 penalty_additive 1.0
trust0 -0.25 mintrust 0.05 error_tolerance 1.0 adaptive_factor 0.2 adaptive_damping 1.0 normalize_weights False constrain_charge false
priors ProperTorsions/Proper/k : 6.0 /priors
$end
$target name ligand_fragment_around_22_24 weight 1.0
type TorsionProfile_SMIRNOFF mol2 ligand_fragment_around_22_24.sdf pdb ligand_fragment_around_22_24.pdb coords scan.xyz writelevel 2 attenuate 1
energy_denom 1.0 energy_upper 10.0 $end
@rvkrishnan30 I'm getting to this today, but I'm unable to reproduce the slow runtime on this molecule - it just ran using an openeye-less env in a little over 1 hour (20 minutes fragmentation, 20 minutes QC generation, 20 minutes parameter optimization) on my macbook air.
Could you upload the complete forcebalance inputs (sdf, pdb, scan.xyz, force-field.offxml) so I can see if that reproduces?
My whole environment is:
Here are my inputs, for me this finishes in about 500 seconds using ForceBalance 1.9.2 and 1.9.3.
@rvkrishnan30 I'm getting to this today, but I'm unable to reproduce the slow runtime on this molecule - it just ran using an openeye-less env in a little over 1 hour (20 minutes fragmentation, 20 minutes QC generation, 20 minutes parameter optimization) on my macbook air.
Could you upload the complete forcebalance inputs (sdf, pdb, scan.xyz, force-field.offxml) so I can see if that reproduces?
My whole environment is:
Full env
@j-wags Thank you for testing this and sharing your inputs. This looks like a more reasonable timescale for ForceBalance runs. With your inputs, for me it took around 700 seconds and I am happy with it.
However, I wonder why it is taking 20 hrs in my actual case. Some differences I see between our inputs are:
Attached are my inputs (sdf, pdb, scan.xyz, qdata.txt, and forcefield) and I hope this will help reproduce my case.
S1A_venkat_Nov2022.tar.gz
Ahh, yeah. It looks like the structure at targets/3gid_D_S1A/input.sdf
in your inputs has 2 hydroxyls, 3 methyls, and 4 methoxys that mine doesn't have. Especially with the hydrogens on the methyl/methoxy groups, that would add a lot more particles to optimize in the minimizations during the torsion fitting.
So fundamentally, we will eventually find macrocycles large enough to ruin things as long as we keep requiring that the entire ring gets included in the fragment we use for fitting. But the macrocycle here seems to be manageable, it's just that the substituents are somehow hanging on in your input, even after fragmentation.
Could you share your bespokefit run command, the output of running conda list
, and input SDF/SMILES for the entire run? It looks like we're getting different results from the fragmentation step, so I can start debugging from there.
Could you share your bespokefit run command, the output of running conda list, and input SDF/SMILES for the entire run? It looks like we're getting different results from the fragmentation step, so I can start debugging from there.
I have a slightly different implementation of bespokefit without the use of qcsubmit as all data are locally generated. But the workflow and settings should predominantly be the same. Having said that, I am not sure why we end up with different fragments. With Ambertools WBO based fragmentation, I end up with a slightly smaller fragment than the parent but still have the -OH, -CH3 groups. (Attached are the original and fragment sdf files)
I wonder why it picks up these groups though they should not affect the WBO at the torsion we are talking about.
Ahh, yeah. It looks like the structure at
targets/3gid_D_S1A/input.sdf
in your inputs has 2 hydroxyls, 3 methyls, and 4 methoxys that mine doesn't have. Especially with the hydrogens on the methyl/methoxy groups, that would add a lot more particles to optimize in the minimizations during the torsion fitting.
I'm not convinced that just having an extra 13 heavy atoms in the system should be enough to make the runtime 60x longer: your optimisation stage took 20 mins, ours takes ~20 hours.
Description
During the BespokeFit followup workshop, @rvkrishnan30 pointed out that the S1A ligand takes >24 hours during ForceBalance. I haven't been able to reproduce this yet because my XTB calculations are slow for some reason but I will soon.
Reproduction
I'm attempting to reproduce in this gist
Output
@rvkrishnan30 any details you could provide on this would be great :)
Software versions
conda list
?