lululxvi / deepxde

A library for scientific machine learning and physics-informed learning
https://deepxde.readthedocs.io
GNU Lesser General Public License v2.1
2.61k stars 735 forks source link

L-BFGS iteration records #1605

Open AmirNoori68 opened 8 months ago

AmirNoori68 commented 8 months ago

Hi, I am using a local windows system with CPU. I have a question: An option named "display_every" exists for the Adam optimizer that records the results as we want. But how about L-BFGS? It records the results every 1000 iterations. How do we change it to something else, such as 100 iterations? I tried " model.compile("L-BFGS") model.train(display_every=100) " And it is not working.

Once I use Google Colab, the loss information, after a few thousand steps, starts showing one by one (it also happened on the local CPU). Somehow, it won't change after that. This happened for some examples and just sometimes. Here, I show you the Elastoplastic example under the setting: " layers = [2] + [200] * 30 + [5]

activation = "tanh" initializer = "Glorot uniform" net = dde.nn.FNN(layers, activation, initializer)

model = dde.Model(data, net)

dde.optimizers.config.set_LBFGS_options( maxcor=100, ftol=0, gtol=1e-08, maxiter=5000, maxfun=None, maxls=50) model.compile("L-BFGS", metrics=["l2 relative error"]) losshistory, train_state = model.train() " (the rest are the same as main example) and here are results: " Using backend: pytorch Other supported backends: tensorflow.compat.v1, tensorflow, jax, paddle. paddle supports more examples now and is recommended. Compiling model... 'compile' took 1.470951 s

Training model...

Step Train loss Test loss Test metric
0 [1.81e+03, 2.67e+02, 3.87e-02, 1.56e-01, 1.19e-02] [1.76e+03, 2.59e+02, 4.26e-02] [1.00e+00]
1000 [3.86e-03, 5.74e-03, 5.17e-03, 6.31e-03, 5.54e-03] [6.26e-03, 9.26e-03, 7.78e-03] [4.23e-02]
2000 [7.91e-04, 7.89e-04, 8.83e-04, 1.06e-03, 7.21e-04] [1.12e-03, 1.04e-03, 1.49e-03] [2.53e-02]
3000 [3.00e-04, 3.07e-04, 3.40e-04, 4.22e-04, 3.02e-04] [4.26e-04, 3.56e-04, 5.86e-04] [1.97e-02]
4000 [1.38e-04, 1.98e-04, 1.63e-04, 2.91e-04, 1.58e-04] [2.23e-04, 2.85e-04, 2.52e-04] [1.51e-02]
4732 [9.51e-05, 1.59e-04, 1.44e-04, 1.75e-04, 1.03e-04] [1.65e-04, 2.46e-04, 2.26e-04] [1.26e-02]
4733 [9.51e-05, 1.59e-04, 1.44e-04, 1.75e-04, 1.03e-04] [1.65e-04, 2.46e-04, 2.26e-04] [1.26e-02]
4734 [9.51e-05, 1.59e-04, 1.44e-04, 1.75e-04, 1.03e-04] [1.65e-04, 2.46e-04, 2.26e-04] [1.26e-02]
4735 [9.51e-05, 1.59e-04, 1.44e-04, 1.75e-04, 1.03e-04] [1.65e-04, 2.46e-04, 2.26e-04] [1.26e-02]
4736 [9.51e-05, 1.59e-04, 1.44e-04, 1.75e-04, 1.03e-04] [1.65e-04, 2.46e-04, 2.26e-04] [1.26e-02]
4737 [9.51e-05, 1.59e-04, 1.44e-04, 1.75e-04, 1.03e-04] [1.65e-04, 2.46e-04, 2.26e-04] [1.26e-02]
4738 [9.51e-05, 1.59e-04, 1.44e-04, 1.75e-04, 1.03e-04] [1.65e-04, 2.46e-04, 2.26e-04] [1.26e-02]
4739 [9.51e-05, 1.59e-04, 1.44e-04, 1.75e-04, 1.03e-04] [1.65e-04, 2.46e-04, 2.26e-04] [1.26e-02]
4740 [9.51e-05, 1.59e-04, 1.44e-04, 1.75e-04, 1.03e-04] [1.65e-04, 2.46e-04, 2.26e-04] [1.26e-02]
4741 [9.51e-05, 1.59e-04, 1.44e-04, 1.75e-04, 1.03e-04] [1.65e-04, 2.46e-04, 2.26e-04] [1.26e-02]
4742 [9.51e-05, 1.59e-04, 1.44e-04, 1.75e-04, 1.03e-04] [1.65e-04, 2.46e-04, 2.26e-04] [1.26e-02]
4743 [9.51e-05, 1.59e-04, 1.44e-04, 1.75e-04, 1.03e-04] [1.65e-04, 2.46e-04, 2.26e-04] [1.26e-02]
4744 [9.51e-05, 1.59e-04, 1.44e-04, 1.75e-04, 1.03e-04] [1.65e-04, 2.46e-04, 2.26e-04] [1.26e-02]
4745 [9.51e-05, 1.59e-04, 1.44e-04, 1.75e-04, 1.03e-04] [1.65e-04, 2.46e-04, 2.26e-04] [1.26e-02]
4746 [9.51e-05, 1.59e-04, 1.44e-04, 1.75e-04, 1.03e-04] [1.65e-04, 2.46e-04, 2.26e-04] [1.26e-02]
4747 [9.51e-05, 1.59e-04, 1.44e-04, 1.75e-04, 1.03e-04] [1.65e-04, 2.46e-04, 2.26e-04] [1.26e-02]
4748 [9.51e-05, 1.59e-04, 1.44e-04, 1.75e-04, 1.03e-04] [1.65e-04, 2.46e-04, 2.26e-04] [1.26e-02]
`` As you can see, after Step=4732, they come one by one and won't change at all, even if I let it go up to thousands more. Thanks for your time.

tsarikahin commented 8 months ago

Yes, I believe the Pytorch implementation has a bug that it does not improve loss accuracy using LBFGS so switch to tf. You will get NaN at the end as well.

lululxvi commented 8 months ago

Yes, this bug was introduced for PyTorch 2.x. If you use PyTorch 1.x or TensorFlow, it should be fine.