Training is too slow - Githubissues

The slowest parts of training are probably computing data and data misfit gradients for every generated model, which is happening as you train. Unfortunately, computing data while training is the best way I know of to achieve low data misfits. Here are some thoughts about how to speed up training:

You could try to speed up modeling within SimPEG - I don't know much about how to do that, but Joe might.
In the training loop in the notebook, if you change use_data_misfit to be False in vae.compute_apply_gradients(), I bet you'll find that training goes MUCH faster. However, this removes data misfit from the loss function and I bet you'll see even higher data misfit than before. Nevertheless, I recommend using this to debug and to try out changes quickly. You can even find a good balance between beta and model_std this way, and then kick in the data_misfit and rebalance. It's also a good way to determine roughly how many training epochs you need before training has basically converged.
At the end of every epoch, models and data misfits are computed for everything in the validation set. That's probably slow as well. You could change it to only compute validation losses every N epochs, or to only compute reconstruction loss and not data misfit when validating during training (change vae.compute_loss to vae.compute_reconstruction_loss in cell 27 and change the print statement below because val_term will probably only have 2 elements.
An idea I once had to speed up training is to only compute the data for a subset of samples in each training batch. For instance, you could just compute the data for eight of the eighty models in each batch. That would speed things up while hopefully keeping the data misfit at the same level.
- In order to do this, you'll need to write a new version of compute_loss(). When you compute d_pre (predict data) here, only do it for a few models in x_tanh. Notice that I already did this once: I wrote compute_reconstruction_loss() as a version of compute_loss() to ignore the data misfit term. That's what's being called when use_data_misfit = False. You'd write a third version of compute_loss() and then set up a new flag to decide whether to call it or one of the other loss computing functions within compute_apply_gradients() here.
- Let me know if you're interested in trying this out. I might have some test code lying around.

sarafhirsch / EM-CVAE-Graduate_Research

Training is too slow #2