mostafa-razavi / TranSFF

0 stars 0 forks source link

Explaining the details of using Auxiliary Nested PSO Optimization method to develop force fields using ITIC and MBAR #7

Open mostafa-razavi opened 5 years ago

mostafa-razavi commented 5 years ago

The Auxiliary Nested PSO Optimization basics

The Auxiliary Nested PSO Optimization is used to find the global minimum of a pre-defined objective function. This routine consists in two nested PSO optimizations. The outer PSO is responsible for exploring the parameter space using direct simulations (e.g. GOMC, LAMMPS, GROMACS), whereas in the inner PSO we use these direct simulations as MBAR reference points to explore the parameter space faster.

In Messerley2018, we concluded that MBAR can be used confidently only if the distance of reference simulation parameter set and the MBAR predicted parameter set is not too high. As a rule of thumb, N_eff_min=50 was recommended, meaning that if we have less than 50 effective snapshots, the Z and Ures predictions are most likely unreliable.

However, we often do not know the global minimum of the objective function a priori. As a result, choosing the proper reference simulations is challenging and perhaps infeasible in case of new potential models or new molecules. Therefore, it is essential to determine the reference simulation parameter sets in a systematic manner. The outer PSO of nested particle swarm optimization method tackles this problem by exploring the parameter space in order to find the best reference simulations. An auxiliary term is introduced in the velocity update formula of the outer PSO algorithm to incorporate the best solution of the inner PSO in the outer PSO search.

mostafa-razavi commented 5 years ago

Determining NP and np

The number of particles (i.e. candidate solutions) in outer and inner PSO routine plays a key role in the speed of convergence. However, this factor is limited by availability of the computational resources. Note that each inner particle (p) or outer particle (P) includes the result of direct simulation or MBAR prediction at one or more ITIC point(s). One must choose a number of inner particles (np) that maximizes the CPU utilization. If five ITIC points are being optimized on a 32-core CPU machine, np=6 seems to be an efficient configurations because 5x5=32 which is less than 32. This 32-core machine can accommodate up to 6 outer particles (NP=6), however using a large NP might not be efficient in that it significantly increases the time taken to finish an PSO iteration. The social learning contribution of PSO algorithm requires a minimum NP of 2. NP=3 has shown to give sufficient speed and accuracy.

mostafa-razavi commented 5 years ago

ITIC points selection

It is important to choose ITIC points used in the objective function carefully. Ideally, all ITIC points should be incorporated in the objective function. However, since our computational resources are limited we should select a few points that represent all ITIC points. ITIC Points number 2,5,8,11,17 were selected, because they cover vast ranges of temperature and density.

mostafa-razavi commented 5 years ago

Objective function

The absolute average deviation of (Z-1)/rho and U^res at the selected ITIC points from REFPROP data was selected as the objective function of the outer PSO.

The objective function of the inner PSO includes an extra term, namely the deviation of the minimum N_eff (i.e. the ITIC point with the lowest N_eff) from a minimum N_eff (usually N_eff=50). This term imposes a penalty on those inner particles that are too far from the reference parameter set. The following correlation causes these particles to stay above the minimum N_eff:

Out

mostafa-razavi commented 5 years ago

Discrete Optimization

It might make sense if I discretize the PSO algorithm in the level of required accuracy.

mostafa-razavi commented 5 years ago

Ways to speed-up

1) Discrete Optimization 2) Keep a record of all valid MBAR predictions to avoid repeating calculations 3) Increase the tol and TOL values (maybe 1e-3 or 1e-2) 4) Limit the number of inner iteration (ni) to an arbitrary number, e.g. ni=10 5) Modify the outer PSO so that the reference parameters for outer iteration x of a given P is exactly the same as the optimum parameter set found during inner PSO of P at iteration x - 1.
6) Pick a larger phia (e.g. 0.75 instead of 0.5) 7) Use only U^res in the objective function, because GOMC's pressure calculation is way more expensive than energy

mostafa-razavi commented 5 years ago

PSO initialization

The outer PSO can be initialized with some of the best guesses available. The inner PSO is recommended to be initialized randomly to increase the probability of finding the global optimum. However, one of the inner particles must be initialized to be exactly the same as the reference parameter set. This might speedup the convergence of the inner PSO.

mostafa-razavi commented 5 years ago

RunNestedPSO_aux_C2_LJSF_test_chain

Modify the outer PSO so that the reference parameters for outer iteration x of a given P is exactly the same as the optimum parameter set found during inner PSO of P at iteration x - 1.

I tried this for ethane and it seems like it found the solution. I1-I7_scores_chain

mostafa-razavi commented 5 years ago

I would like to speed up the process by using only U^res for inner PSO.

Steps:

1) Done! I need to modify GOMC so that it doesn't calculate pressure during rerun. 2) Done! Test if the run is affected 3) Done! Test how fast we can get using this approach. (Double speed) 4) Test the new process

Here is the result for IC4:

I-12_P-3_3.914792465361519-87.97925464274606_3.9295601617973204-70.73559066592952

I-12_P-3_3 914792465361519-87 97925464274606_3 9295601617973204-70 73559066592952_uresonly

I1-I16_scores_IB_uresonly

Conclusions: 1) As you can see, the optimization converges well and both Z and U^res get optimized. 2) By recalculating only energies, we saved some time that can be spent towards having more particles in outer PSO (5 here). 3) Using 24 cores, it took 3 days to perform 12 outer PSO iterations. 4 days to perform 16. So 4 iterations per day. 4) The sigma values are alarmingly close to each other. I wonder if it has anything to do with the fact that we didn't include pressure in the inner objective function. It's hard to believe that's the case though.

Todo:

1) Test the above result 2) Run the same and include Z in addition to U^res and see if the method finds another solution

mostafa-razavi commented 5 years ago

Change the shape of N_eff penalty function

The current penalty function is too steep. Maybe using a linear function from N_eff=1 to target N_eff creates some sensitivity towards how parameters should change for the cost function to stay zero.

Result: I ran NP but it stopped at iteration 5