lululxvi / deepxde

A library for scientific machine learning and physics-informed learning
https://deepxde.readthedocs.io
GNU Lesser General Public License v2.1
2.64k stars 739 forks source link

Data sampling strategy and model save/restore #57

Closed jhssyb closed 4 years ago

jhssyb commented 4 years ago

Dear Lu, I have another puzzle when running the code. Let's say we sample 50 points along the boundary, where bcs = [bc_rectX,bc_rectY, bc_circleX, bc_circleY] data = dde.data.PDE(geom, pde, bcs, num_domain=300, num_boundary=50, num_test=20) When the code is running, it always issue a warning: Warning: 50 points required, but 100 points sampled. Uniform random is not guaranteed. Warning: CSGDifference.uniform_points not implemented. Use random_points instead.

If I set the bounary points=100, it will sample 200 points randomly rather than uniformly. What's the reason causing this problem?

lululxvi commented 4 years ago

What is your geometry? By default, train points are sampled randomly, and test points are sampled uniformly. But sometimes it is hard to sample uniformly. There are two cases:

  1. Sometimes it is impossible to sample uniformly, e.g., there isn't any way to sample uniformly in a rectangle. In these cases, I will switch to random points.
  2. Sometimes it is impossible to sample exactly the required number of points. For example, we have a square [0, 1]^2, and want to sample 50 points inside the domain, but sqrt(50) is not an integer. Then I compute ceil(sqrt(50)) = 8, and sample 8^2 = 64 points.
jhssyb commented 4 years ago

Hi Lu, the geometry is the cylinder inside a rectangle domain using the "geom = CSGDifference(rectangular, circle)", does this fall into the first category in your last answer?

By the way, I took your advice for model saving with the code model.train(epochs=1000, model_save_path=r"C:\Users\Dell\Documents\Flow over cylinder") But it does not work, after training, I looked for the saved model in my working directory by sorting the file based on modification date, no new filed emerged. 捕获

lululxvi commented 4 years ago

Yes, cylinder minus rectangle is the first category.

I am not sure why the saving does not work, and there are a few possibilities:

  1. Try model_save_path=r"C:\Users\Dell\Documents\Flow_over_cylinder\model.ckpt". Note, avoid to use empty space in the path.
  2. If possible, try to run the code in a Linux machine. I don't have any experience of using TensorFlow in Windows.
jhssyb commented 4 years ago

Thanks for your prompt reply, Lu. I tried your first suggestion, it works pretty good. Then I tried to recover the trained model for future prediction with the following code: model = dde.Model(data, net) model.restore(r"C:\Users\Dell\Documents\Flow over cylinder\model.ckpt-1000.meta") But it does not work, with the error:

File "C:\ProgramData\Anaconda3\lib\site-packages\deepxde\model.py", line 324, in restore self.saver.restore(self.sess, save_path) AttributeError: 'NoneType' object has no attribute 'restore' Should we only use restore and save inside model.train(...,model_restore_path=None, model_save_path=None,) function? Or we can use the save and restore function outside model.train()?

lululxvi commented 4 years ago

Did you call Model.compile() before restore?

jhssyb commented 4 years ago

I put the compile function beforehand according to your suggestion ` data = dde.data.PDE(geom, pde, bcs, num_domain=300, num_boundary=50, num_test=20) net = dde.maps.FNN([2] + [10] * 2 + [2], "tanh", "Glorot uniform") model = dde.Model(data, net) model.compile("L-BFGS-B") model.restore(r"C:\Users\Dell\Documents\Flow over cylinder\model.ckpt-1000.meta")

 # Plot PDE residue
x = geom.uniform_points(1000, True)
print(x)
print(x.shape)
y = model.predict(x, operator=pde)
plt.figure()
plt.quiver(x[:,0],x[:,1], y[:,0],y[:,1],)
plt.xlabel("x")
plt.ylabel("PDE residue")
plt.show()`

However, it complains that "DataLossError (see above for traceback): Unable to open table file C:\Users\Dell\Documents\Flow over cylinder\model.ckpt-1000.meta: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator? [[node save_1/RestoreV2 (defined at C:\ProgramData\Anaconda3\lib\site-packages\deepxde\model.py:197) ]]" Does this mean we need to feed the model.restore() with .dat and .meta files simultaneously or only claim the savepath, then the program will retrieve the needed files automatically?

lululxvi commented 4 years ago

Try

model.restore(r"C:\Users\Dell\Documents\Flow over cylinder\model.ckpt-1000")
jhssyb commented 4 years ago

Still not work, which spins out: InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Assign requires shapes of both tensors to match. lhs shape= [10] rhs shape= [50] [[node save_2/Assign_8 (defined at C:\ProgramData\Anaconda3\lib\site-packages\deepxde\model.py:197) ]]

lululxvi commented 4 years ago

Are the other codes the same, e.g., network size?

jhssyb commented 4 years ago

Yes, I used the same network, first trained and saved it, then restored it. Now another complain is about; NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key beta1_power_1 not found in checkpoint [[node save_5/RestoreV2 (defined at C:\ProgramData\Anaconda3\lib\site-packages\deepxde\model.py:197) ]]

lululxvi commented 4 years ago

Could you first try the following:

model = dde.Model(data, net)
model.compile(...)
checker = dde.callbacks.ModelCheckpoint(
        "model/model.ckpt", save_better_only=True, period=1000
    )
model.train(epochs=epochs, callbacks=[checker])
model.restore("model/model.ckpt-" + str(train_state.best_step), verbose=1)
jhssyb commented 4 years ago

Should we assign sth to 'train_state'? It produces an error:

UnboundLocalError: local variable 'train_state' referenced before assignment

lululxvi commented 4 years ago

Oh, yes.

losshistory, train_state = model.train(epochs=epochs, callbacks=[checker])
jhssyb commented 4 years ago

Yes, it works now, with the following code: ` epochs=1000 data = dde.data.PDE(geom, pde, bcs, num_domain=300, num_boundary=50, num_test=20) net = dde.maps.FNN([2] + [10] * 2 + [2], "tanh", "Glorot uniform") model = dde.Model(data, net) model.compile("adam", lr=0.001) checker = dde.callbacks.ModelCheckpoint("model/model.ckpt", save_better_only=True, period=1000) losshistory, train_state = model.train(epochs=epochs, callbacks=[checker])

model.restore("model/model.ckpt-" + str(train_state.best_step), verbose=1)

`

It seems that we need to use model.train() and model.restore() simultaneously rather than only use model.restore() for saved model? Another problem is that it complains when running the aforementioned code:

File "C:\ProgramData\Anaconda3\lib\site-packages\deepxde\model.py", line 152, in train raise ValueError("No epochs for {}.".format(self.optimizer))

ValueError: No epochs for adam.

lululxvi commented 4 years ago

I am not sure why the error of "No epochs for adam" occurs. Could you provide a simple example code which reproduce the error?

jhssyb commented 4 years ago

I knew the reason for this error--it is because I passed "adam" to Model.compile(). After changing to "L-BFGS-B", the error disappears. When I running the following code, another error emerges: ` epochs=2000 data = dde.data.PDE(geom, pde, bcs, num_domain=300, num_boundary=50, num_test=20) net = dde.maps.FNN([2] + [10] * 2 + [2], "tanh", "Glorot uniform") model = dde.Model(data, net) model.compile("L-BFGS-B") checker = dde.callbacks.ModelCheckpoint("model/model.ckpt", save_better_only=True, period=1000) losshistory, train_state = model.train(epochs=epochs, callbacks=[checker])

model.restore("model/model.ckpt-" + str(train_state.best_step), verbose=1)

`

File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\training\saver.py", line 1268, in restore

ValueError: The passed save_path is not a valid checkpoint: model/model.ckpt-4791

It is very weird as I set the epochs=1000, but the code runs around 5000 steps, that's why the error complains about the checkpoint at 4791 iterations. I checked the saved file, only the 1000 iteration results are saved. The code running process and saved files are shown in following pictures. Hope to get your advice 捕s获 捕获

lululxvi commented 4 years ago

I am confused why passing "adam" to Model.compile() will have the error? Because in your code epochs=1000.

It is tricky to use L-BFGS for saving and restoring, because of the implementation of L-BFGS. Could you try to only use Adam during training and restoring?

lululxvi commented 4 years ago

In my tests, it is only necessary to call Model.compile() without Model.train(). For example:

model = dde.Model(data, net)
model.compile("adam", lr=0.001, metrics=["l2 relative error"])
checkpointer = dde.callbacks.ModelCheckpoint("./model/model.ckpt", verbose=1, save_better_only=True)
model.train(epochs=10000, callbacks=[checkpointer])
model = dde.Model(data, net)
model.compile("adam", lr=0.001, metrics=["l2 relative error"])
model.restore("./model/model.ckpt-7000")
jhssyb commented 4 years ago

I cannot find the error either. But I copied your code and run the train and then restore it, it works. Great help! By the way, do you know how to retrieve this Q&A page? Once I closed this issue, the page is not available and others cannot see the solution if similar problems arise.

lululxvi commented 4 years ago

This page is always available even after closed issue. The link to this page remains the same.

jhssyb commented 4 years ago

Thanks, I will close it.

sumantkrsoni commented 4 years ago

@jhssyb restoring part of the trained model can be done at any time without recompiling the entire code.

If possible, can you share the complete code for this restore process?

Hopefully, you will help me.

zxw4688 commented 4 years ago

Hello @lululxvi 博士,你好! 非常感谢你快速回答我的提问,真心的谢谢你。我看你这个提问How can I save a trained model and then load the model later?的内容,但是我还是不是很清楚啊,我是想修改权重和偏置的初值,将它们的初值改为用已经训练出来的权重和偏置。 另外,我在运行你的Deepxde所给的例子‘diffusion_1d_inverse’发现不是每次运行都会正确,有时候无法得到正确系数,不知道为啥?

zxw4688 commented 4 years ago

diffusion_1d_inverse,运行的结果如下: 第一次结算结果为:得到正确系数 image 第二次结算结果为:没有得到正确系数 image 第三次结算结果为:没有得到正确系数 image

zxw4688 commented 4 years ago

代码为: `from future import absolute_import from future import division from future import print_function

import numpy as np

import deepxde as dde from deepxde.backend import tf

def main(): C = tf.Variable(2.0)

def pde(x, y):
    dy_x = tf.gradients(y, x)[0]
    dy_x, dy_t = dy_x[:, 0:1], dy_x[:, 1:]
    dy_xx = tf.gradients(dy_x, x)[0][:, 0:1]
    return (
        dy_t
        - C * dy_xx
        + tf.exp(-x[:, 1:])
        * (tf.sin(np.pi * x[:, 0:1]) - np.pi ** 2 * tf.sin(np.pi * x[:, 0:1]))
    )

def func(x):
    return np.sin(np.pi * x[:, 0:1]) * np.exp(-x[:, 1:])
def funcy(x):
    return np.sin(np.pi * x[:, 0:1]) * np.exp(-x[:, 1:])+1*np.random.normal(0.0, 1.0,1)
geom = dde.geometry.Interval(-1, 1)
timedomain = dde.geometry.TimeDomain(0, 1)
geomtime = dde.geometry.GeometryXTime(geom, timedomain)

bc = dde.DirichletBC(geomtime, func, lambda _, on_boundary: on_boundary)
ic = dde.IC(geomtime, func, lambda _, on_initial: on_initial)

observe_x = np.vstack((np.linspace(-1, 1, num=10), np.full((10), 1))).T
#print(observe_x)
ptset = dde.bc.PointSet(observe_x)
observe_y = dde.DirichletBC(
    geomtime, ptset.values_to_func(funcy(observe_x)), lambda x, _: ptset.inside(x)
)

data = dde.data.TimePDE(
    geomtime,
    pde,
    [bc, ic, observe_y],
    num_domain=40,
    num_boundary=20,
    num_initial=10,
    anchors=observe_x,
    solution=func,
    num_test=10000,
)

layer_size = [2] + [32] * 3 + [1]
activation = "tanh"
initializer = "Glorot uniform"
net = dde.maps.FNN(layer_size, activation, initializer)

model = dde.Model(data, net)

model.compile("adam", lr=0.001, metrics=["l2 relative error"])
variable = dde.callbacks.VariableValue(C, period=1000)
losshistory, train_state = model.train(epochs=60000, callbacks=[variable])

dde.saveplot(losshistory, train_state, issave=True, isplot=True)

if name == "main": main() `

lululxvi commented 4 years ago
zxw4688 commented 4 years ago

Hello @lululxvi 你好! 非常感谢你的耐心解答与帮助,我根据你提供对初值改为用已经训练出来的权重和偏置,写了如下的代码,主要是针对diffusion_1d_inverse,但是运行出错了,麻烦LuLu博士有空的时候帮我修改一下,非常感谢!具体的代码如下:

`from future import absolute_import from future import division from future import print_function

import numpy as np

import deepxde as dde from deepxde.backend import tf

def main(): CC=[] INT_C=np.array([0.5,0.8,1.2,1.4,1.8,2.0,2.4]) #二阶空间导数的初值系数 KN=np.size(INT_C) for k in range(0,KN): ck=INT_C[k] #取初值 C = tf.Variable(ck) #定义C为变量

    initializer = "Glorot uniform" #给定权重和偏置的初值
    def pde(x, y):
        dy_x = tf.gradients(y, x)[0]
        dy_x, dy_t = dy_x[:, 0:1], dy_x[:, 1:]
        dy_xx = tf.gradients(dy_x, x)[0][:, 0:1]
        return (
            dy_t
            - C * dy_xx
            + tf.exp(-x[:, 1:])
            * (tf.sin(np.pi * x[:, 0:1]) - np.pi ** 2 * tf.sin(np.pi * x[:, 0:1]))
        )

    def func(x):
        return np.sin(np.pi * x[:, 0:1]) * np.exp(-x[:, 1:])
    def funcy(x):
        return np.sin(np.pi * x[:, 0:1]) * np.exp(-x[:, 1:])+1*np.random.normal(0.0, 1.0,1)
    geom = dde.geometry.Interval(-1, 1)
    timedomain = dde.geometry.TimeDomain(0, 1)
    geomtime = dde.geometry.GeometryXTime(geom, timedomain)

    bc = dde.DirichletBC(geomtime, func, lambda _, on_boundary: on_boundary)
    ic = dde.IC(geomtime, func, lambda _, on_initial: on_initial)

    observe_x = np.vstack((np.linspace(-1, 1, num=10), np.full((10), 1))).T
    #print(observe_x)
    ptset = dde.bc.PointSet(observe_x)
    observe_y = dde.DirichletBC(
        geomtime, ptset.values_to_func(funcy(observe_x)), lambda x, _: ptset.inside(x)
    )

    data = dde.data.TimePDE(
        geomtime,
        pde,
        [bc, ic, observe_y],
        num_domain=40,
        num_boundary=20,
        num_initial=10,
        anchors=observe_x,
        solution=func,
        num_test=10000,
    )

    layer_size = [2] + [32] * 3 + [1]
    activation = "tanh"
    net = dde.maps.FNN(layer_size, activation, initializer)
    model = dde.Model(data, net)
    model.compile("adam", lr=0.001, metrics=["l2 relative error"])
    variable = dde.callbacks.VariableValue(C, period=1000)
    losshistory, train_state = model.train(epochs=10000, callbacks=[variable])

    #dde.saveplot(losshistory, train_state, issave=True, isplot=True)
    #保存权重和偏置
    checkpointer = dde.callbacks.ModelCheckpoint("./model/model.ckpt", verbose=1, save_better_only=True)
    model.train(epochs=10000, callbacks=[checkpointer])

    model = dde.Model(data, net)
    model.compile("adam", lr=0.001, metrics=["l2 relative error"])
    model.restore("./model/model.ckpt-7000")

    #下载权重和偏置
    initializer=model.load("./model/model.ckpt-7000") #使用已经训练好的权重与偏置

    CC=np.concatenate((CC, variable),axis = 0)
print(CC)

if name == "main": main() `

错误为: `Traceback (most recent call last):

File "D:\ProgramData\envs\tensorflow\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1136, in binary_op_wrapper out = r_op(x)

File "D:\ProgramData\envs\tensorflow\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1155, in r_binary_op_wrapper x = ops.convert_to_tensor(x, dtype=y.dtype.base_dtype, name="x")

File "D:\ProgramData\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 1473, in convert_to_tensor raise ValueError(

ValueError: Tensor conversion requested dtype float32 for Tensor with dtype float64: <tf.Tensor 'Variable_1/read:0' shape=() dtype=float64>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "E:\DNN_matlab\myrebp\solve_PDE_0811\DeepXDE\deepxde-master\examples\my_example_0819\new_diffusion_1d_inverse.py", line 83, in main()

File "E:\DNN_matlab\myrebp\solve_PDE_0811\DeepXDE\deepxde-master\examples\my_example_0819\new_diffusion_1d_inverse.py", line 64, in main model.compile("adam", lr=0.001, metrics=["l2 relative error"])

File "D:\ProgramData\envs\tensorflow\lib\site-packages\deepxde\utils.py", line 52, in wrapper result = f(*args, **kwargs)

File "D:\ProgramData\envs\tensorflow\lib\site-packages\deepxde\model.py", line 82, in compile self.losses = self.data.losses(self.net.targets, self.net.outputs, loss, self)

File "D:\ProgramData\envs\tensorflow\lib\site-packages\deepxde\data\pde.py", line 50, in losses f = self.pde(model.net.inputs, outputs)

File "E:\DNN_matlab\myrebp\solve_PDE_0811\DeepXDE\deepxde-master\examples\my_example_0819\new_diffusion_1d_inverse.py", line 25, in pde

TypeError: Input 'y' of 'Mul' Op has type float32 that does not match type float64 of argument 'x'.`

lululxvi commented 4 years ago

Try C = tf.Variable(1.0, dtype=tf.float32). Also, do not use the outside for loop for C. It may not work. Just re-run the code with different C values.

zxw4688 commented 4 years ago

我的目的是想看一下使用上一个系数的权重和偏置,在不同的系数的初值下,结果如何,所以我想让C能够循环。

kimianoorbakhsh commented 3 years ago

In my tests, it is only necessary to call Model.compile() without Model.train(). For example:

  • First, train
model = dde.Model(data, net)
model.compile("adam", lr=0.001, metrics=["l2 relative error"])
checkpointer = dde.callbacks.ModelCheckpoint("./model/model.ckpt", verbose=1, save_better_only=True)
model.train(epochs=10000, callbacks=[checkpointer])
  • Second, restore
model = dde.Model(data, net)
model.compile("adam", lr=0.001, metrics=["l2 relative error"])
model.restore("./model/model.ckpt-7000")

Hi @lululxvi I tried the same approach and I get some errors. I appreciate your help with this. I first use this:

model = dde.Model(data, net)
model.compile("adam", lr=0.001,metrics=["l2 relative error"])
losshistory, train_state = model.train(epochs=10000, model_save_path = "model/model.ckpt")

and then this is what happens in my files: image Then I use this:

model = dde.Model(data, net)
model.compile("adam", lr=0.001,metrics=["l2 relative error"])
model.restore("model/model.ckpt-10000")
losshistory, train_state = model.train(epochs=10000, model_save_path = "model/model.ckpt")

But I get these errors: image

lululxvi commented 3 years ago

It seems that the network parameters are not consistent between two models. Are the other code the same for the saving and restoring?

kimianoorbakhsh commented 3 years ago

It seems that the network parameters are not consistent between two models. Are the other code the same for the saving and restoring?

@lululxvi I really appreciate your help with this. I just use them in a for loop. like below: (I have to mention that I use my own data files and I update the data in each iteration of the for loop and I want to train the same model with this new data everytime.)

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import numpy as np

import deepxde as dde
from deepxde.backend import tf
models = []

for time_step in range(100):

  # preparing data for this specific time step
  train_x = all_[time_step][0:train_number]
  train_y = new_temps[0:train_number, time_step].reshape((1681,1))
  map_dict =  {}
  for i in range(len(train_x)):
    map_dict[(train_x[i][0], train_x[i][1], train_x[i][2])] = train_y[i]

  # defining the pde
  def pde(x, y):
    dy_x = tf.gradients(y, x)[0]
    dy_x1, dy_x2, dy_t = dy_x[:, 0:1], dy_x[:, 1:2], dy_x[:, 2:]
    dy_x1x1 = tf.gradients(dy_x1, x)[0][:, 0:1]
    dy_x2x2 = tf.gradients(dy_x2, x)[0][:, 1:2]
    return (dy_t - (dy_x1x1 - dy_x2x2))

  def func(x):
    return np.full((len(x), 1), 10e-2)

  def funcic(x): return np.full((len(x), 1), 25)

  def funcbc(x):  # x = the collection of the points on the boundary 
    result = np.zeros((len(x), 1))
    for i in range(len(x)):
      result[i] = map_dict[(x[i][0], x[i][1], x[i][2])]  # 0: y, 1: z, 2 : time
    return result

  def solution(x) : return train_y

  # defining the geometry
  geom = dde.geometry.geometry_2d.Rectangle((1,1), (41,41))
  timedomain = dde.geometry.TimeDomain(0, 1)
  geomtime = dde.geometry.GeometryXTime(geom, timedomain)

  # defining the boundary and initial conditions
  bc = dde.DirichletBC(geomtime, funcbc, lambda _, on_boundary: on_boundary)
  ic = dde.IC(geomtime, funcic, lambda _, on_initial: on_initial)

  # defining the data in deepxde format
  data = dde.data.TimePDE(
      geomtime,
      pde,
      [bc, ic],
      anchors = train_x,
      solution = solution
  )

  # defining the model
  layer_size = [3] + [32] * 3 + [1]
  activation = "tanh"
  initializer = "Glorot uniform"
  net = dde.maps.FNN(layer_size, activation, initializer)

  # In the first time-step, just store the model. In the following ones, first restre, then train and then store again.
  if time_step == 0:
    model = dde.Model(data, net)
    model.compile("adam", lr=0.001,metrics=["l2 relative error"])
    losshistory, train_state = model.train(epochs=10000, model_save_path = "model/model.ckpt")
    print(time_step)
  else:
    model = dde.Model(data, net)
    model.compile("adam", lr=0.001,metrics=["l2 relative error"])
    losshistory, train_state = model.train(epochs=10000, model_restore_path= "model/model.ckpt-10000")
    print(time_step)
lululxvi commented 3 years ago

The issue may come from the for loop. Use dde.apply in the for loop, see an example at https://github.com/lululxvi/deep-learning-for-indentation/blob/master/src/nn.py

ankittyagi1987 commented 2 years ago

Dear @lululxvi, I could not find the apply method in deepxde to run the model in for loop. I am facing error to restore the model in a for loop.

lululxvi commented 2 years ago

dde.utils.apply