Gradient-Based Adaptive Loss Weights

lululxvi / deepxde

A library for scientific machine learning and physics-informed learning

https://deepxde.readthedocs.io

GNU Lesser General Public License v2.1

2.74k stars 757 forks source link

Gradient-Based Adaptive Loss Weights #331

Open jacob-rains opened 3 years ago

jacob-rains commented 3 years ago

Hello,

I have been trying to get my PINN models to more accurately satisfy the BCs by adjusting the weights of each individual term in the loss function before adding them all together using the method described in (https://arxiv.org/abs/2001.04536). It is similar to loss_weights in model.compile, but the weights are adjustable based on gradient information. It requires the calculation of the maximum of the absolute values of the gradients of the PDE residual w.r.t. the training variables, as well as the mean of the absolute values of the BC residuals w.r.t the training variables. I have been able to do so in _train_sgd by using tf.gradients, but I would like to be able to do so using the gradient information calculated by the optimizer before the optimizer applies the gradients (so I don't calculate the gradients more than once per epoch).

Thank you very much for the awesome package!

harshil-patel-code commented 3 years ago

Hi @jrains98: Can you please share your code? I would also like to try this for my problem of coupled PDEs. I may be able to help you on your question after looking at your code.

lululxvi commented 3 years ago

@jrains98 Great. It is doable to avoid the extra computation of gradients. In TensorFlow optimizers, minimize() has two steps: compute_gradients() and apply_gradients(), see https://www.tensorflow.org/api_docs/python/tf/compat/v1/train/Optimizer. So we can use these steps explicitly.

@jrains98 @harshil-patel-code If you can interested, we can add this into DeepXDE, probability implemented as a Callback. We can discuss.

By the way, now DeepXDE has supported two versions of TensorFlow: tensorflow, and tensorflow.compat.v1.

jacob-rains commented 3 years ago

I apologize for the rather long delay in response, but thank you for your prompt response!

@harshil-patel-code what kind of code would you like me to post? I can post the specific training case I'm working on if that's what you want.

@lululxvi that would be very good, since compute_gradients() and apply_gradients() only work for the tensorflow optimizers and not the scipy optimizers. I'm also interested on what you think about the parallelization of these gradient calls to reduce computation time since they would have to be called for each loss term instead of the total loss.

lululxvi commented 3 years ago

@jrains98 Good idea. As the first step, we can make it work, and then think about what is the best way for parallel. By the way, now DeepXDE also supports PyTorch, and in PyTorch L-BFGS has the same usage as Adam.

engsbk commented 2 years ago

This may be a naive question, but I've been using DeepXDE since it was based on TensorFlow, and I want to try running it with Pytorch, do I have to change the code or should I reinstall it? I want to see if there is any difference in runtime or accuracy between Tensorflow and PyTorch as backends if that makes sense.

Thanks for taking the time to read this!

lululxvi commented 2 years ago

No. DeepXDE (the same code) supports PyTorch. You only need to install PyTorch and specific the backend.

ZazaCro commented 2 years ago

Hi @lululxvi @jrains98 @harshil-patel-code, I was just wondering whether the work on the dynamic loss weights (Gradient-Based Adaptive Loss Weights), referenced in the first comment, continues or what is the status.

Thank you

ZazaCro commented 2 years ago

Dear @lululxvi, I have just a couple of questions.

You mentioned to use opt.compute_gradients(loss, ) to avoid calculating gradients multiple times in the code. Nevertheless, when opt.minimize(loss, global_step=global_step) is used we use a sum of all loss terms in comparison with the paper where we need to calculate gradients using the sum of equations' losses and the sum of BCs and ICs losses or individual BC, IC losses. Could you advise how to go about that? Do not we need to calculate the gradients separately using tf.gradients instead?
You have also mentioned to use a callback for evaluating the adaptive loss weights. Could you advise how to pass the information back to the optimizer? It seems to me we need to define a placeholder in the model class and feed the value in using self.model.sess.run(self.model.adaptive_constant_bcs, feed_dict). This however requires updating feed_dict also at the other instances in the code. Would be there some other way?

Thank you for your help and a great tool, Robin

lululxvi commented 2 years ago

@ZazaCro I think using TensorFlow 2.x or PyTorch backend would be easier, because in these two backends, the gradients are first computed and then back-prop is taken. For example, here is the optimization for TensorFlow 2.x https://github.com/lululxvi/deepxde/blob/af6190db34ebbcb00b88cbafd5ad9bc51de12de4/deepxde/model.py#L184 where you can modify and compute each gradient easilier.

riccardotomada commented 2 years ago

Hi @ZazaCro, did you come up with a working implementation of the adaptive loss weights strategy?