Speeding up process_replica_exchange

cwalker7 commented 4 years ago

I've noticed that process_replica_exchange in rep_exch.py can be pretty slow - particularly this block of code which writes the .dat file:

f = open(os.path.join(output_directory, "replica_energies.dat"), "w")
for step in range(total_steps):
    f.write(f"{step:10d}")
    sampler_states = reporter.read_sampler_states(iteration=step)
    for replica_index in range(n_replicas):
        replica_positions[replica_index, step, :, :] = sampler_states[replica_index].positions
        f.write(f"{replica_energies[replica_index,replica_index,step]:12.6f}")
    f.write("\n")
f.close()

It takes about 10 minutes to write .dat for 1 million frames, which is ~75% of the entire process_replica_exchange time (excluding writing pdb or dcd trajectories files). I suspect it is how we are writing each individual energy one at a time, but will have to check.

mrshirts commented 4 years ago

Interesting. I don't have an immediate thought, other than that we may not actually need to write ascii files much of the time, this part could perhaps be made optional?

cwalker7 commented 4 years ago

Ok I think it makes sense to have the .dat file be optional for now. Currently we are only using that file for the physical validation, but it could just as easily be read in as a pickle or by numpy.save and load.

shirtsgroup / cg_openmm

Speeding up process_replica_exchange #80