TRAM jobs exceed walltime on cb2rr

yunhuige commented 6 years ago

Tried to run TRAM estimation on cb2rr using real dataset (1182+11 thermodynamic states) with different state decomposition (10,30,50,100,200) on different stride (10,100,1000). Some of them failed due to the limit of walltime. TRAM is much slower than dTRAM. We need to figure out a way in the near future to run these jobs on clusters. Apparently, cb2rr is not a enough in walltime. Owlsnest2 may be an option. But since it is single processing job, I need to talk to Axel about that. Maybe I can try to apply an account of using Compute clusters again? They rejected my previous application though. I will check failed jobs and update them here later. Also the memory usage and walltime will be posted here.

vvoelz commented 6 years ago

Yunhui -- discussion via issues on github is a great idea! I appreciate the next-level moxie you've been applying to working out the bugs here.

I think we're rushing ahead with doing the biggest calculation we can. From looking at the unbiased trajectory data, it's clear to me that we have WAAAAAY more data that we need to do a successful dTRAM or TRAM calculation. A successful calculation to me means that we have only a few or no binding/unbinding transitions in the unbiased state, but more in the biased states, such that the information from the biased ensembles informs the estimates of the unbiased rates.

A good test set is probably not strided, but truncated (say, the first 500 ns or less: 100 ns? -- this would abrogate the need for HUGE calculations on Owlsnest/cb2rr). If the snapshots are only taken every 100 ps, then I worry we might be losing valuable information from the biased ensembles, especially λ-biased, where unbinding/binding may be really fast. Striding will make this worse!

Any way we could see a distance trace for all the ensembles of just the first 100 ns?

yunhuige commented 6 years ago

@vvoelz There are several points I think need to be clear here: 1/ In the unbiased simulation we have 120 µs in total (20 Runs 6 µs for each) and the snapshots were saved every 100 ps. 2/ In the biased simulations, we have 400 ns for each thermodynamic state and we have 8 harmonic restraints, 21 lambdas for each restraint and 2 directions (pulling/pushing). Meanwhile we have 21 lambdas without any umbrellas. So in total we have 8 21 2 + 21 ensembles. In my current calculation, I only included 8 11 * 2 + 11 ensembles (the first half lambdas were not used). In the simulations, snapshots were saved every 2 ps which is much more frequent than the unbiased simulation as I mentioned above. There is one unbiased ensemble in these simulations, with lambda = 1.0 and no harmonic restraint used. It is the last ensemble in the last 21 lambdas simulations. It has the same length (400ns) and snapshots frequency (every 2 ps) as other biased simulations. 3/ I'm still not sure how much the results will be affected by the missing states in the unbiased ensemble. I may need to ask people of PyEMMA again. I think it will be great to test on the truncated dataset if missing states is not a problem in dTRAM/TRAM estimation. 4/ In the biased simulation, I used the COM distances generated from GROMACS during the simulation and I think I should double check if that confirms my own calculation or not. Due to the high flexibility of this system, I need to be much more careful in pbc treatment.

yunhuige / toy_binding

TRAM jobs exceed walltime on cb2rr #2