Open AndyMcAliley opened 2 months ago
One other approach is to train a neural network to be a surrogate forward modeler, and then use that neural network in place of the forward modeling algorithm (SimPEG in your case). The good aspect of this is that it's easier and faster to back-propagate gradients through the neural network than it is to obtain those gradients from SimPEG. The difficulty is that it's an extra step, and you need to be confident that the surrogate forward modeler is sufficiently accurate.
The slowest parts of training are probably computing data and data misfit gradients for every generated model, which is happening as you train. Unfortunately, computing data while training is the best way I know of to achieve low data misfits. Here are some thoughts about how to speed up training:
use_data_misfit
to be False invae.compute_apply_gradients()
, I bet you'll find that training goes MUCH faster. However, this removes data misfit from the loss function and I bet you'll see even higher data misfit than before. Nevertheless, I recommend using this to debug and to try out changes quickly. You can even find a good balance between beta and model_std this way, and then kick in the data_misfit and rebalance. It's also a good way to determine roughly how many training epochs you need before training has basically converged.vae.compute_loss
tovae.compute_reconstruction_loss
in cell 27 and change the print statement below becauseval_term
will probably only have 2 elements.compute_loss()
. When you compute d_pre (predict data) here, only do it for a few models in x_tanh. Notice that I already did this once: I wrotecompute_reconstruction_loss()
as a version ofcompute_loss()
to ignore the data misfit term. That's what's being called whenuse_data_misfit = False
. You'd write a third version ofcompute_loss()
and then set up a new flag to decide whether to call it or one of the other loss computing functions withincompute_apply_gradients()
here.