The large interval problem of the function or the function oscillates in the small interval 函数的大区间问题或者函数在小区间内震荡

Wulx2050 commented 2 years ago

在使用 deepxde 训练的时候，为了测试一些极端情况，我经常把输入数据设置的极小(接近0)或者超过几百，或者有些写了错误的边界条件、初始条件，这时候训练模型Train loss和Test loss会出现nan，并且立刻停止训练。出现这种情况一般需要截断误差，即如果一旦损失值超过某个界限[loss<min, loss>max]，就直接截断，取预先设定的值。规范化 PDE或分配损失权重( normalize the PDE, or assign a loss weight) 需要调整很多次，而且如果我有10000个形式不同的PDE问题，难道我要手动规范化10000个 PDE吗？

by Google Translate： When using deepxde for training, in order to test some extreme cases, I often set the input data to a very small (close to 0) or more than a few hundred, or some write wrong boundary conditions and initial conditions, then train the model Train loss and Test loss will show nan and stop training immediately. In this case, the truncation error is generally required, that is, if the loss value exceeds a certain limit [loss<min, loss>max], it will be directly truncated and the preset value will be taken. Normalizing the PDE, or assigning a loss weight requires many adjustments, and if I have 10,000 PDE problems with different forms, do I have to manually normalize 10,000 PDEs?

lululxvi commented 2 years ago

You can do this automatically. Everything you can do manually and systematically can be done automatically, right?

Wulx2050 commented 2 years ago

You can do this automatically. Everything you can do manually and systematically can be done automatically, right?

对我来说自动化不是那么容易，如果只是同种类型的、或者一二阶的pde，我可以自动化，更高阶的就变得麻烦了。麻烦您在deepxde的网络中加入归一化层。而且有些函数是如此复杂，即使规范化 PDE、归一化数据或重新分配了损失权重也难以训练，比如一个看起来简单的函数(可以定义一个PDE，此函数恰好是这个PDE的解) y=sin(x), x in [0,2pik] 归一化以后 y = sin(2pikx) x in [0,1]，当k增大到100左右，无论是缩放数据(其实我们的输入输出都属于[0,1])、加深网络层数、加宽网络、增加采样点、调整学习率、重新分配损失权重，都无法拟合这样一个函数，更不要说k继续增大到1e5，1e100了。这可以叫函数的大区间问题或者小区间内的震荡函数。目前唯一的解决办法就是只处理问题域的极小的一部分，这样来说因为计算能力的限制和神经网络优化算法的特点，PINN算法只能拟合小区间内的、不剧烈震荡的函数吗？

by Google Translate： It's not that easy for me to automate, if it's just the same type, or first or second order pde, I can automate it, higher order ones become troublesome. Please add a normalization layer to the deepxde network. And some functions are so complex that it is difficult to train even if the PDE is normalized, the data is normalized, or the loss weights are redistributed, such as a function that looks simple (a PDE can be defined, and this function happens to be the solution of this PDE) y=sin (x), x in [0,2pik] After normalization, y = sin(2pikx) x in [0,1], when k increases to about 100, no matter the scaled data ( In fact, our input and output belong to [0, 1]), deepening the number of network layers, widening the network, increasing sampling points, adjusting the learning rate, and redistributing the loss weight, are all unable to fit such a function, not to mention that k continues to increase. As big as 1e5, 1e100. This can be called a large interval problem of a function or an oscillatory function in this small interval. The only solution at present is to only deal with a very small part of the problem domain. In this way, due to the limitation of computing power and the characteristics of neural network optimization algorithms, the PINN algorithm can only fit functions within the small interval that do not oscillate violently?

Wulx2050 commented 2 years ago

另一个问题，model.compile 的 loss_weights= [1, 1e6, 1] 的数量级差距如此之大。[github公式显示我是使用的谷歌浏览器插件 MathJax 3 Plugin for Github]

这里解决的例子是一维阻尼谐振子: $$m \dfrac{d^2 x}{d t^2} + \mu \dfrac{d x}{d t} + kx = 0,$$ 或者 $$\dfrac{d^2 x}{d t^2} + \dfrac{\mu}{m} \dfrac{d x}{d t} + \dfrac{k}{m}x = 0,$$ 与初始条件 $$x(0) = 1,\left. \frac{d x}{d t}\right|_{t=0} = 0.$$ 将着重于解决欠阻尼状态的问题，即当 $$\delta < \omega_0,\mathrm{with}\ \delta = \dfrac{\mu}{2m}, \omega_0 = \sqrt{\dfrac{k}{m}}.\mu = 2m\delta, \dfrac{k}{m} = \omega_0^2.$$ 这里有以下精确解: $$x(t) = e^{-\delta t}(2 A \cos(\phi + \omega t)),\mathrm{with}\ \omega=\sqrt{\omega_0^2 - \delta^2}, \phi=\arctan(-\frac{\delta}{\omega}), A=\frac{1}{2\cos(\phi)}.$$

对于这个问题，使用 $m=1, \delta = 2 $，$\omega_0 = 20 $，并在 $t\in[0,1]$ 中学习求解问题。

下面是我的求解代码：

import deepxde as dde
from deepxde.backend import tf
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

dde.config.set_default_float(value="float32")

m = 1.
delta = 2.
omega_0 = 20.

mu = 2*m*delta
k = m*omega_0*omega_0
omega = np.sqrt(omega_0**2-delta**2)
phi = np.arctan(-delta/omega)
A = 1/(2*np.cos(phi))

def pde(t, x):
    dx_t = dde.grad.jacobian(x, t)
    dx_tt = dde.grad.hessian(x, t)
    # return m*dx_tt - mu*dx_t + k*x
    return dx_tt - mu/m*dx_t + k/m*x

def boundary(x, on_boundary):
    return on_boundary and np.isclose(x[0], 0)

def func(t):
    # omega = np.sqrt(omega_0**2-delta**2)
    # phi = np.arctan(-delta/omega)
    # A = 1/(2*np.cos(phi))
    return np.exp(-delta*t)*2*A*np.cos(omega*t+phi)

scale = 1  # [0, 1*scale]

geom = dde.geometry.TimeDomain(0., scale*1)
# geom = dde.geometry.TimeDomain(1e-8, scale*1)
ic1 = dde.icbc.initial_conditions.IC(geom, lambda x: 1., boundary, component=0) # 初值条件 x(0) = 1.
# ic_2 = dde.icbc.boundary_conditions.NeumannBC(geom, lambda x: 0.*x, boundary, component=0) # 导数条件 dx(0)/dt = 0

ic_2 = dde.icbc.OperatorBC(
    geom,
    lambda x, y, _: dde.grad.jacobian(y, x),
    boundary,
) # 导数条件 dx(0)/dt = 0

# observe_x = geom.uniform_points(50)
# # bc_init = dde.icbc.boundary_conditions.PointSetBC(points=np.array([[0.]]), values=np.array([[1.]]), component=0)
# observe_y = dde.icbc.boundary_conditions.PointSetBC(points=observe_x, values=func(observe_x), component=0)

data = dde.data.PDE(geom, pde, [ic1, ic_2], 300*scale, 1, solution=func, num_test=500*scale)

layer_size = [1] + [50] * 3 + [1]
activation = "tanh"
initializer = "Glorot uniform"
# initializer = "He uniform"
net = dde.nn.FNN(layer_size, activation, initializer)

model = dde.Model(data, net)
model.compile("adam", lr=1e-3, metrics=["l2 relative error"],loss_weights = [1, 1e6, 1])
# model.compile("adam", lr=1e-3, metrics=["l2 relative error"],loss_weights = [1, 100, 1])
# model.compile("adam", lr=1e-3, metrics=["l2 relative error"],loss_weights = [1e-3, 1e3, 1e-2])
# model.compile("adam", lr=1e-4, metrics=["l2 relative error"],loss_weights = [1e-4, 1e4, 1e-2])
# model.compile("adam", lr=1e-4, metrics=["l2 relative error"],loss_weights = [1e-3, 1e2, 1])
losshistory, train_state = model.train(epochs=20000) # 如果loss出现nan，重新运行。

t_list = data.test()[0].flatten()
t_list = sorted(t_list)
true_data = [func(t) for t in t_list]
pred_data = model.predict(data.test()[0])

plt.figure(figsize=(10, 6))
# plt.plot(t_list, true_data, "-k", label="True")
# plt.plot(t_list, pred_data[:,0], "-r", label="Pred")
plt.scatter(t_list, true_data, label="True")
plt.scatter(t_list, pred_data[:,0], label="Pred")
plt.legend()
plt.show()

这是一个很简单的问题，我使用pytorch完成过，但是使用deepxde结果好像有些出乎意料。我不知道哪里出了问题，请帮我检查一下。

pytorch 版参考： https://github.com/Wulx2050/DeepXDE-and-PINN/blob/main/99%E7%89%A9%E7%90%86%E4%BF%A1%E6%81%AF%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%E7%AE%80%E4%BB%8B.ipynb

下面是核心代码：

for i in range(20000):
    optimizer.zero_grad()

    # compute the "data loss"
    yh = model(x_data)
    loss1 = torch.mean((yh-y_data)**2)# use mean squared error

    # compute the "physics loss"
    yhp = model(x_physics)
    dx  = torch.autograd.grad(yhp, x_physics, torch.ones_like(yhp), create_graph=True)[0]# computes dy/dx
    dx2 = torch.autograd.grad(dx,  x_physics, torch.ones_like(dx),  create_graph=True)[0]# computes d^2y/dx^2
    physics = dx2 + mu*dx + k*yhp# computes the residual of the 1D harmonic oscillator differential equation
    loss2 = (1e-4)*torch.mean(physics**2)

    # loss3 = dy(y=0)/dx = 0
    # backpropagate joint loss
    loss = loss1 + loss2# add two loss terms together
    loss.backward()
    optimizer.step()

deepxde是tf1.x后端。

Wulx2050 commented 2 years ago

我了解到一种叫做区域分解的技术，就是让不同的网络对不同的区域进行学习，从而降低单个神经网络需要的复杂度和学习难度。但是我认为这种方法还是能处理的区间还是非常有限。有人考虑过直接把整个问题作为字符串输入神经网络吗？以前深度学习处理NLP(自然语言处理)问题效果始终不好，但是这几年注意力机制改变了一切。也许我们应该考虑用NLP模型而不是MLP(多层感知机)处理数学问题？

I learned about a technique called region decomposition, which is to let different networks learn different regions, thereby reducing the complexity and learning difficulty required by a single neural network. But I think the range that this method can handle is still very limited. Has anyone considered feeding the whole question directly into the neural network as a string? In the past, deep learning has always been ineffective in dealing with NLP (natural language processing) problems, but in recent years, the attention mechanism has changed everything. Maybe we should consider NLP models instead of MLPs (Multilayer Perceptrons) for math problems?

lululxvi commented 2 years ago

麻烦您在deepxde的网络中加入归一化层。而且有些函数是如此复杂，即使规范化 PDE、归一化数据或重新分配了损失权重也难以训练，比如一个看起来简单的函数(可以定义一个PDE，此函数恰好是这个PDE的解) y=sin(x), x in [0,2_pi_k] 归一化以后 y = sin(2_pi_kx) x in [0,1]，当k增大到100左右，无论是缩放数据(其实我们的输入输出都属于[0,1])、加深网络层数、加宽网络、增加采样点、调整学习率、重新分配损失权重，都无法拟合这样一个函数，更不要说k继续增大到1e5，1e100了。这可以叫函数的大区间问题或者小区间内的震荡函数。目前唯一的解决办法就是只处理问题域的极小的一部分，这样来说因为计算能力的限制和神经网络优化算法的特点，PINN算法只能拟合小区间内的、不剧烈震荡的函数吗？

我了解到一种叫做区域分解的技术，就是让不同的网络对不同的区域进行学习，从而降低单个神经网络需要的复杂度和学习难度。但是我认为这种方法还是能处理的区间还是非常有限。

dde.nn.FNN already supports batch normalization, but batch-norm may not work for PINN. The problem you described is still an open problem. There are some papers on this. Domain decomposition you mentioned is one method.

有人考虑过直接把整个问题作为字符串输入神经网络吗？以前深度学习处理NLP(自然语言处理)问题效果始终不好，但是这几年注意力机制改变了一切。也许我们应该考虑用NLP模型而不是MLP(多层感知机)处理数学问题？

There are a few works using language model for PDE, but as far as I can see, whether it really works is still questionable.

lululxvi commented 2 years ago

这里解决的例子是一维阻尼谐振子

This problem should be easy to solve. What is the difficulty?

You may do

data = dde.data.PDE(geom, pde, [ic1, ic_2], 300*scale, 2, solution=func, num_test=500*scale)

Wulx2050 commented 2 years ago

函数 pde(t, x) 写错了，dx/dt的符号应该是为正。已经完成了，见： The function pde(t, x) is wrong, the sign of dx/dt should be positive. See:

https://github.com/Wulx2050/DeepXDE-and-PINN/blob/main/2%E4%BB%80%E4%B9%88%E6%98%AFPINN.ipynb

Wulx2050 commented 2 years ago

DeepXDE好像并不支持数据生成器(data generator)，我要如何修改DeepXDE源码使得batch_size 参数真正生效从而可以训练几百万条数据？

DeepXDE does not seem to support data generators. How do I modify the DeepXDE source code to make the batch_size parameter take effect so that I can train millions of data?

lululxvi commented 2 years ago

What "data generator" do you mean? The one in TensorFlow or PyTorch?

mini-batch is supported in DeepXDE via dde.callbacks.PDEResidualResampler https://deepxde.readthedocs.io/en/latest/modules/deepxde.html#deepxde.callbacks.PDEResidualResampler

Wulx2050 commented 2 years ago

What "data generator" do you mean? The one in TensorFlow or PyTorch?

data generator 是keras 的说法，现在统一叫DataLoader，因为CV和NLP中一次性加载所有数据集几乎不可能，默认data generator 。现在DeepXDE是这种一次性加载所有数据集的方式：

# Load entire dataset
X, y = np.load('data.npy')
# Train model on your dataset
model.fit(x=X, y=y)

然而，过大的数据集无法同时加载到内存中，然而模型的性能提升又不得不使用大规模数据集。data generator是一种数据集逐批生成方法，来解决由于数据集过大、内存不足的问题。Data generator 一般需要实现__len__和__getitem__，每次获得batch_size的数据。

所以我的意思是，在DeepXDE中怎么使用这种数据集逐批生成的方法：

# Generators
training_generator = DataGenerator(train_X, train_Y, **params)
validation_generator = DataGenerator(validation_X, validation_Y, **params)

# Train model on dataset
model.fit_generator(generator=training_generator,
                    validation_data=validation_generator,
                    use_multiprocessing=True,
                    workers=6)

当我取非常多的数据点的时候，Data generator是无法避免的。

data = dde.data.TimePDE(
    geomtime,
    pde,
    [bc, ic],
    num_domain=4000000,
    num_boundary=2000,
    num_initial=2000,
    train_distribution="pseudo",
    solution=func,
    num_test=1000000,
)
# Train model on dataset
model.fit_generator(generator=data.training_generator,
                    validation_data=data.validation_generator,
                    **params)

dde.callbacks.PDEResidualResampler 明显不完全等同于Data generator，而且这个函数对dde.icbc.boundary_conditions.PointSetBC 无效。

lululxvi commented 2 years ago

Then you have to modify the source code.

dde.callbacks.PDEResidualResampler 这个函数对dde.icbc.boundary_conditions.PointSetBC 无效。

This is the design (a feature, not a bug).

Steph-Yhf commented 1 year ago

How to add a normalization layer and a denormalization layer to the DeepXDE network？

lululxvi / deepxde

The large interval problem of the function or the function oscillates in the small interval 函数的大区间问题或者函数在小区间内震荡 #645