Closed smao-astro closed 3 years ago
By the way, #68 does used batch_normalization
so the output might influenced by the issue mentioned here.
The code of batch_normalization
should be correct. You can try a simple function approximation for testing. For PDEs, it is a little complicated, and I am not sure whether batch normalization would help or not. It becomes even more complicated when using L-BFGS, which is a quasi-Newton method. I usually don't use batch norm for PDEs. Let me know if you have any specific reason to use batch norm.
Burgers.py
L-BFGS
is not used too much.apply_feature_transform
to rescale the network inputs, e.g., if x is in [0, a], and t in [0, b]
net = ...
net.apply_feature_transform(lambda X: tf.concat([X[:, 0:1] / a, X[:, 0:1] / b)], axis=1))
Hi Lu,
I see. Thank you for you reply. I will temporarily close this issue since I also did not spot anything wrong with the code of batch normalization.
The issue still exists when using batch_normalization = "before"
AND "L-BFGS-B"
, and I doubt this is due to update_ops
is not executed when is_scipy_opts(optimizer)
is True
import numpy as np
import deepxde as dde
from deepxde.backend import tf
import sys
def gen_testdata():
data = np.load("dataset/Burgers.npz")
t, x, exact = data["t"], data["x"], data["usol"].T
xx, tt = np.meshgrid(x, t)
X = np.vstack((np.ravel(xx), np.ravel(tt))).T
y = exact.flatten()[:, None]
return X, y
def main(batch_normalization):
def pde(x, y):
dy_x = tf.gradients(y, x)[0]
dy_x, dy_t = dy_x[:, 0:1], dy_x[:, 1:2]
dy_xx = tf.gradients(dy_x, x)[0][:, 0:1]
return dy_t + y * dy_x - 0.01 / np.pi * dy_xx
geom = dde.geometry.Interval(-1, 1)
timedomain = dde.geometry.TimeDomain(0, 0.99)
geomtime = dde.geometry.GeometryXTime(geom, timedomain)
bc = dde.DirichletBC(geomtime, lambda x: 0, lambda _, on_boundary: on_boundary)
ic = dde.IC(
geomtime, lambda x: -np.sin(np.pi * x[:, 0:1]), lambda _, on_initial: on_initial
)
data = dde.data.TimePDE(
geomtime, pde, [bc, ic], num_domain=2540, num_boundary=80, num_initial=160
)
net = dde.maps.FNN(
[2] + [20] * 3 + [1],
"tanh",
"Glorot normal",
batch_normalization=batch_normalization,
)
model = dde.Model(data, net)
# model.compile("adam", lr=1e-4)
# model.train(epochs=1500)
model.compile("L-BFGS-B")
losshistory, train_state = model.train()
dde.saveplot(losshistory, train_state, issave=True, isplot=True)
X, y_true = gen_testdata()
y_pred = model.predict(X)
f = model.predict(X, operator=pde)
print("Mean residual:", np.mean(np.absolute(f)))
print("L2 relative error:", dde.metrics.l2_relative_error(y_true, y_pred))
np.savetxt("test.dat", np.hstack((X, y_true, y_pred)))
if __name__ == "__main__":
main(sys.argv[1] if len(sys.argv) > 1 else None)
Run with before
as command line args, get
The issue is that the test loss is much larger than train loss.
Yes, "L-BFGS-B" does not work with "batch_normalization", because "L-BFGS-B" is from scipy. But the TensorFlow optimizers should work.
No, it does not (correct me if I am wrong). Applying batch_normalization="before"
to examples/diffusion_1d.py
gives
I am not sure whether it makes sense to use batch-norm, because here we want to compute the derivatives dy/dx
. My suggestion is that you may just remove batch-norm. We have worked on many different cases, and we never use batch-norm (the main purpose of batch-norm is for deep networks). There are always other ways.
Hi Lu,
I see, I agree with you that one should be careful when using batch normalization is such case. Thank you for your reply!
Hi Lu,
I am trying
deepxde.maps.fnn.FNN.batch_normalization
at https://github.com/lululxvi/deepxde/blob/8e811adfca060766dcbbaeec59c30300be134d00/deepxde/maps/fnn.py#L27 And I noticed that when https://github.com/lululxvi/deepxde/blob/8e811adfca060766dcbbaeec59c30300be134d00/deepxde/model.py#L253 is called, the loss increased significantly compared to the loss computed at https://github.com/lululxvi/deepxde/blob/8e811adfca060766dcbbaeec59c30300be134d00/deepxde/model.py#L243 To reproduce, here are two scripts comparing with each other:without
batch_normalization
, it givesto compare, if I add
batch_normalization
the output is
notice that at
step = 2484
the loss increased three order of magnitude.I am guessing that
mean
andstandard deviation
from training is either not properly stored or not properly reused when testing. Any idea? Thanks!