Closed jhssyb closed 4 years ago
What is your geometry? By default, train points are sampled randomly, and test points are sampled uniformly. But sometimes it is hard to sample uniformly. There are two cases:
Hi Lu, the geometry is the cylinder inside a rectangle domain using the "geom = CSGDifference(rectangular, circle)", does this fall into the first category in your last answer?
By the way, I took your advice for model saving with the code
model.train(epochs=1000, model_save_path=r"C:\Users\Dell\Documents\Flow over cylinder")
But it does not work, after training, I looked for the saved model in my working directory by sorting the file based on modification date, no new filed emerged.
Yes, cylinder minus rectangle is the first category.
I am not sure why the saving does not work, and there are a few possibilities:
model_save_path=r"C:\Users\Dell\Documents\Flow_over_cylinder\model.ckpt"
. Note, avoid to use empty space in the path.Thanks for your prompt reply, Lu. I tried your first suggestion, it works pretty good. Then I tried to recover the trained model for future prediction with the following code:
model = dde.Model(data, net) model.restore(r"C:\Users\Dell\Documents\Flow over cylinder\model.ckpt-1000.meta")
But it does not work, with the error:
File "C:\ProgramData\Anaconda3\lib\site-packages\deepxde\model.py", line 324, in restore self.saver.restore(self.sess, save_path) AttributeError: 'NoneType' object has no attribute 'restore' Should we only use restore and save inside model.train(...,model_restore_path=None, model_save_path=None,) function? Or we can use the save and restore function outside model.train()?
Did you call Model.compile()
before restore?
I put the compile function beforehand according to your suggestion ` data = dde.data.PDE(geom, pde, bcs, num_domain=300, num_boundary=50, num_test=20) net = dde.maps.FNN([2] + [10] * 2 + [2], "tanh", "Glorot uniform") model = dde.Model(data, net) model.compile("L-BFGS-B") model.restore(r"C:\Users\Dell\Documents\Flow over cylinder\model.ckpt-1000.meta")
# Plot PDE residue
x = geom.uniform_points(1000, True)
print(x)
print(x.shape)
y = model.predict(x, operator=pde)
plt.figure()
plt.quiver(x[:,0],x[:,1], y[:,0],y[:,1],)
plt.xlabel("x")
plt.ylabel("PDE residue")
plt.show()`
However, it complains that "DataLossError (see above for traceback): Unable to open table file C:\Users\Dell\Documents\Flow over cylinder\model.ckpt-1000.meta: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator? [[node save_1/RestoreV2 (defined at C:\ProgramData\Anaconda3\lib\site-packages\deepxde\model.py:197) ]]" Does this mean we need to feed the model.restore() with .dat and .meta files simultaneously or only claim the savepath, then the program will retrieve the needed files automatically?
Try
model.restore(r"C:\Users\Dell\Documents\Flow over cylinder\model.ckpt-1000")
Still not work, which spins out: InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
Assign requires shapes of both tensors to match. lhs shape= [10] rhs shape= [50] [[node save_2/Assign_8 (defined at C:\ProgramData\Anaconda3\lib\site-packages\deepxde\model.py:197) ]]
Are the other codes the same, e.g., network size?
Yes, I used the same network, first trained and saved it, then restored it. Now another complain is about; NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
Key beta1_power_1 not found in checkpoint [[node save_5/RestoreV2 (defined at C:\ProgramData\Anaconda3\lib\site-packages\deepxde\model.py:197) ]]
Could you first try the following:
model = dde.Model(data, net)
model.compile(...)
checker = dde.callbacks.ModelCheckpoint(
"model/model.ckpt", save_better_only=True, period=1000
)
model.train(epochs=epochs, callbacks=[checker])
model.restore("model/model.ckpt-" + str(train_state.best_step), verbose=1)
Should we assign sth to 'train_state'? It produces an error:
UnboundLocalError: local variable 'train_state' referenced before assignment
Oh, yes.
losshistory, train_state = model.train(epochs=epochs, callbacks=[checker])
Yes, it works now, with the following code: ` epochs=1000 data = dde.data.PDE(geom, pde, bcs, num_domain=300, num_boundary=50, num_test=20) net = dde.maps.FNN([2] + [10] * 2 + [2], "tanh", "Glorot uniform") model = dde.Model(data, net) model.compile("adam", lr=0.001) checker = dde.callbacks.ModelCheckpoint("model/model.ckpt", save_better_only=True, period=1000) losshistory, train_state = model.train(epochs=epochs, callbacks=[checker])
model.restore("model/model.ckpt-" + str(train_state.best_step), verbose=1)
`
It seems that we need to use model.train() and model.restore() simultaneously rather than only use model.restore() for saved model? Another problem is that it complains when running the aforementioned code:
File "C:\ProgramData\Anaconda3\lib\site-packages\deepxde\model.py", line 152, in train raise ValueError("No epochs for {}.".format(self.optimizer))
ValueError: No epochs for adam.
I am not sure why the error of "No epochs for adam" occurs. Could you provide a simple example code which reproduce the error?
I knew the reason for this error--it is because I passed "adam" to Model.compile(). After changing to "L-BFGS-B", the error disappears. When I running the following code, another error emerges: ` epochs=2000 data = dde.data.PDE(geom, pde, bcs, num_domain=300, num_boundary=50, num_test=20) net = dde.maps.FNN([2] + [10] * 2 + [2], "tanh", "Glorot uniform") model = dde.Model(data, net) model.compile("L-BFGS-B") checker = dde.callbacks.ModelCheckpoint("model/model.ckpt", save_better_only=True, period=1000) losshistory, train_state = model.train(epochs=epochs, callbacks=[checker])
model.restore("model/model.ckpt-" + str(train_state.best_step), verbose=1)
`
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\training\saver.py", line 1268, in restore
ValueError: The passed save_path is not a valid checkpoint: model/model.ckpt-4791
It is very weird as I set the epochs=1000, but the code runs around 5000 steps, that's why the error complains about the checkpoint at 4791 iterations. I checked the saved file, only the 1000 iteration results are saved. The code running process and saved files are shown in following pictures. Hope to get your advice
I am confused why passing "adam" to Model.compile()
will have the error? Because in your code epochs=1000
.
It is tricky to use L-BFGS for saving and restoring, because of the implementation of L-BFGS. Could you try to only use Adam during training and restoring?
In my tests, it is only necessary to call Model.compile()
without Model.train()
. For example:
model = dde.Model(data, net)
model.compile("adam", lr=0.001, metrics=["l2 relative error"])
checkpointer = dde.callbacks.ModelCheckpoint("./model/model.ckpt", verbose=1, save_better_only=True)
model.train(epochs=10000, callbacks=[checkpointer])
model = dde.Model(data, net)
model.compile("adam", lr=0.001, metrics=["l2 relative error"])
model.restore("./model/model.ckpt-7000")
I cannot find the error either. But I copied your code and run the train and then restore it, it works. Great help! By the way, do you know how to retrieve this Q&A page? Once I closed this issue, the page is not available and others cannot see the solution if similar problems arise.
This page is always available even after closed issue. The link to this page remains the same.
Thanks, I will close it.
@jhssyb restoring part of the trained model can be done at any time without recompiling the entire code.
If possible, can you share the complete code for this restore process?
Hopefully, you will help me.
Hello @lululxvi 博士,你好!
非常感谢你快速回答我的提问,真心的谢谢你。我看你这个提问How can I save a trained model and then load the model later?
的内容,但是我还是不是很清楚啊,我是想修改权重和偏置的初值,将它们的初值改为用已经训练出来的权重和偏置。
另外,我在运行你的Deepxde所给的例子‘diffusion_1d_inverse’发现不是每次运行都会正确,有时候无法得到正确系数,不知道为啥?
diffusion_1d_inverse,运行的结果如下: 第一次结算结果为:得到正确系数 第二次结算结果为:没有得到正确系数 第三次结算结果为:没有得到正确系数
代码为: `from future import absolute_import from future import division from future import print_function
import numpy as np
import deepxde as dde from deepxde.backend import tf
def main(): C = tf.Variable(2.0)
def pde(x, y):
dy_x = tf.gradients(y, x)[0]
dy_x, dy_t = dy_x[:, 0:1], dy_x[:, 1:]
dy_xx = tf.gradients(dy_x, x)[0][:, 0:1]
return (
dy_t
- C * dy_xx
+ tf.exp(-x[:, 1:])
* (tf.sin(np.pi * x[:, 0:1]) - np.pi ** 2 * tf.sin(np.pi * x[:, 0:1]))
)
def func(x):
return np.sin(np.pi * x[:, 0:1]) * np.exp(-x[:, 1:])
def funcy(x):
return np.sin(np.pi * x[:, 0:1]) * np.exp(-x[:, 1:])+1*np.random.normal(0.0, 1.0,1)
geom = dde.geometry.Interval(-1, 1)
timedomain = dde.geometry.TimeDomain(0, 1)
geomtime = dde.geometry.GeometryXTime(geom, timedomain)
bc = dde.DirichletBC(geomtime, func, lambda _, on_boundary: on_boundary)
ic = dde.IC(geomtime, func, lambda _, on_initial: on_initial)
observe_x = np.vstack((np.linspace(-1, 1, num=10), np.full((10), 1))).T
#print(observe_x)
ptset = dde.bc.PointSet(observe_x)
observe_y = dde.DirichletBC(
geomtime, ptset.values_to_func(funcy(observe_x)), lambda x, _: ptset.inside(x)
)
data = dde.data.TimePDE(
geomtime,
pde,
[bc, ic, observe_y],
num_domain=40,
num_boundary=20,
num_initial=10,
anchors=observe_x,
solution=func,
num_test=10000,
)
layer_size = [2] + [32] * 3 + [1]
activation = "tanh"
initializer = "Glorot uniform"
net = dde.maps.FNN(layer_size, activation, initializer)
model = dde.Model(data, net)
model.compile("adam", lr=0.001, metrics=["l2 relative error"])
variable = dde.callbacks.VariableValue(C, period=1000)
losshistory, train_state = model.train(epochs=60000, callbacks=[variable])
dde.saveplot(losshistory, train_state, issave=True, isplot=True)
if name == "main": main() `
Hello @lululxvi 你好!
非常感谢你的耐心解答与帮助,我根据你提供对初值改为用已经训练出来的权重和偏置,写了如下的代码,主要是针对diffusion_1d_inverse
,但是运行出错了,麻烦LuLu博士有空的时候帮我修改一下,非常感谢!具体的代码如下:
`from future import absolute_import from future import division from future import print_function
import numpy as np
import deepxde as dde from deepxde.backend import tf
def main(): CC=[] INT_C=np.array([0.5,0.8,1.2,1.4,1.8,2.0,2.4]) #二阶空间导数的初值系数 KN=np.size(INT_C) for k in range(0,KN): ck=INT_C[k] #取初值 C = tf.Variable(ck) #定义C为变量
initializer = "Glorot uniform" #给定权重和偏置的初值
def pde(x, y):
dy_x = tf.gradients(y, x)[0]
dy_x, dy_t = dy_x[:, 0:1], dy_x[:, 1:]
dy_xx = tf.gradients(dy_x, x)[0][:, 0:1]
return (
dy_t
- C * dy_xx
+ tf.exp(-x[:, 1:])
* (tf.sin(np.pi * x[:, 0:1]) - np.pi ** 2 * tf.sin(np.pi * x[:, 0:1]))
)
def func(x):
return np.sin(np.pi * x[:, 0:1]) * np.exp(-x[:, 1:])
def funcy(x):
return np.sin(np.pi * x[:, 0:1]) * np.exp(-x[:, 1:])+1*np.random.normal(0.0, 1.0,1)
geom = dde.geometry.Interval(-1, 1)
timedomain = dde.geometry.TimeDomain(0, 1)
geomtime = dde.geometry.GeometryXTime(geom, timedomain)
bc = dde.DirichletBC(geomtime, func, lambda _, on_boundary: on_boundary)
ic = dde.IC(geomtime, func, lambda _, on_initial: on_initial)
observe_x = np.vstack((np.linspace(-1, 1, num=10), np.full((10), 1))).T
#print(observe_x)
ptset = dde.bc.PointSet(observe_x)
observe_y = dde.DirichletBC(
geomtime, ptset.values_to_func(funcy(observe_x)), lambda x, _: ptset.inside(x)
)
data = dde.data.TimePDE(
geomtime,
pde,
[bc, ic, observe_y],
num_domain=40,
num_boundary=20,
num_initial=10,
anchors=observe_x,
solution=func,
num_test=10000,
)
layer_size = [2] + [32] * 3 + [1]
activation = "tanh"
net = dde.maps.FNN(layer_size, activation, initializer)
model = dde.Model(data, net)
model.compile("adam", lr=0.001, metrics=["l2 relative error"])
variable = dde.callbacks.VariableValue(C, period=1000)
losshistory, train_state = model.train(epochs=10000, callbacks=[variable])
#dde.saveplot(losshistory, train_state, issave=True, isplot=True)
#保存权重和偏置
checkpointer = dde.callbacks.ModelCheckpoint("./model/model.ckpt", verbose=1, save_better_only=True)
model.train(epochs=10000, callbacks=[checkpointer])
model = dde.Model(data, net)
model.compile("adam", lr=0.001, metrics=["l2 relative error"])
model.restore("./model/model.ckpt-7000")
#下载权重和偏置
initializer=model.load("./model/model.ckpt-7000") #使用已经训练好的权重与偏置
CC=np.concatenate((CC, variable),axis = 0)
print(CC)
if name == "main": main() `
错误为: `Traceback (most recent call last):
File "D:\ProgramData\envs\tensorflow\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1136, in binary_op_wrapper out = r_op(x)
File "D:\ProgramData\envs\tensorflow\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1155, in r_binary_op_wrapper x = ops.convert_to_tensor(x, dtype=y.dtype.base_dtype, name="x")
File "D:\ProgramData\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 1473, in convert_to_tensor raise ValueError(
ValueError: Tensor conversion requested dtype float32 for Tensor with dtype float64: <tf.Tensor 'Variable_1/read:0' shape=() dtype=float64>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "E:\DNN_matlab\myrebp\solve_PDE_0811\DeepXDE\deepxde-master\examples\my_example_0819\new_diffusion_1d_inverse.py", line 83, in
File "E:\DNN_matlab\myrebp\solve_PDE_0811\DeepXDE\deepxde-master\examples\my_example_0819\new_diffusion_1d_inverse.py", line 64, in main model.compile("adam", lr=0.001, metrics=["l2 relative error"])
File "D:\ProgramData\envs\tensorflow\lib\site-packages\deepxde\utils.py", line 52, in wrapper result = f(*args, **kwargs)
File "D:\ProgramData\envs\tensorflow\lib\site-packages\deepxde\model.py", line 82, in compile self.losses = self.data.losses(self.net.targets, self.net.outputs, loss, self)
File "D:\ProgramData\envs\tensorflow\lib\site-packages\deepxde\data\pde.py", line 50, in losses f = self.pde(model.net.inputs, outputs)
File "E:\DNN_matlab\myrebp\solve_PDE_0811\DeepXDE\deepxde-master\examples\my_example_0819\new_diffusion_1d_inverse.py", line 25, in pde
C * dy_xx
File "D:\ProgramData\envs\tensorflow\lib\site-packages\tensorflow\python\ops\variables.py", line 1074, in _run_op return tensor_oper(a.value(), *args, **kwargs)
File "D:\ProgramData\envs\tensorflow\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1141, in binary_op_wrapper raise e
File "D:\ProgramData\envs\tensorflow\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1125, in binary_op_wrapper return func(x, y, name=name)
File "D:\ProgramData\envs\tensorflow\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1457, in _mul_dispatch return multiply(x, y, name=name)
File "D:\ProgramData\envs\tensorflow\lib\site-packages\tensorflow\python\util\dispatch.py", line 201, in wrapper return target(*args, **kwargs)
File "D:\ProgramData\envs\tensorflow\lib\site-packages\tensorflow\python\ops\math_ops.py", line 509, in multiply return gen_math_ops.mul(x, y, name)
File "D:\ProgramData\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gen_mathops.py", line 6175, in mul , _, _op, _outputs = _op_def_library._apply_op_helper(
File "D:\ProgramData\envs\tensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 503, in _apply_op_helper raise TypeError(
TypeError: Input 'y' of 'Mul' Op has type float32 that does not match type float64 of argument 'x'.`
Try C = tf.Variable(1.0, dtype=tf.float32)
. Also, do not use the outside for loop for C. It may not work. Just re-run the code with different C values.
我的目的是想看一下使用上一个系数的权重和偏置,在不同的系数的初值下,结果如何,所以我想让C能够循环。
In my tests, it is only necessary to call
Model.compile()
withoutModel.train()
. For example:
- First, train
model = dde.Model(data, net) model.compile("adam", lr=0.001, metrics=["l2 relative error"]) checkpointer = dde.callbacks.ModelCheckpoint("./model/model.ckpt", verbose=1, save_better_only=True) model.train(epochs=10000, callbacks=[checkpointer])
- Second, restore
model = dde.Model(data, net) model.compile("adam", lr=0.001, metrics=["l2 relative error"]) model.restore("./model/model.ckpt-7000")
Hi @lululxvi I tried the same approach and I get some errors. I appreciate your help with this. I first use this:
model = dde.Model(data, net) model.compile("adam", lr=0.001,metrics=["l2 relative error"]) losshistory, train_state = model.train(epochs=10000, model_save_path = "model/model.ckpt")
and then this is what happens in my files: Then I use this:
model = dde.Model(data, net) model.compile("adam", lr=0.001,metrics=["l2 relative error"]) model.restore("model/model.ckpt-10000") losshistory, train_state = model.train(epochs=10000, model_save_path = "model/model.ckpt")
But I get these errors:
It seems that the network parameters are not consistent between two models. Are the other code the same for the saving and restoring?
It seems that the network parameters are not consistent between two models. Are the other code the same for the saving and restoring?
@lululxvi I really appreciate your help with this. I just use them in a for loop. like below: (I have to mention that I use my own data files and I update the data in each iteration of the for loop and I want to train the same model with this new data everytime.)
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import deepxde as dde
from deepxde.backend import tf
models = []
for time_step in range(100):
# preparing data for this specific time step
train_x = all_[time_step][0:train_number]
train_y = new_temps[0:train_number, time_step].reshape((1681,1))
map_dict = {}
for i in range(len(train_x)):
map_dict[(train_x[i][0], train_x[i][1], train_x[i][2])] = train_y[i]
# defining the pde
def pde(x, y):
dy_x = tf.gradients(y, x)[0]
dy_x1, dy_x2, dy_t = dy_x[:, 0:1], dy_x[:, 1:2], dy_x[:, 2:]
dy_x1x1 = tf.gradients(dy_x1, x)[0][:, 0:1]
dy_x2x2 = tf.gradients(dy_x2, x)[0][:, 1:2]
return (dy_t - (dy_x1x1 - dy_x2x2))
def func(x):
return np.full((len(x), 1), 10e-2)
def funcic(x): return np.full((len(x), 1), 25)
def funcbc(x): # x = the collection of the points on the boundary
result = np.zeros((len(x), 1))
for i in range(len(x)):
result[i] = map_dict[(x[i][0], x[i][1], x[i][2])] # 0: y, 1: z, 2 : time
return result
def solution(x) : return train_y
# defining the geometry
geom = dde.geometry.geometry_2d.Rectangle((1,1), (41,41))
timedomain = dde.geometry.TimeDomain(0, 1)
geomtime = dde.geometry.GeometryXTime(geom, timedomain)
# defining the boundary and initial conditions
bc = dde.DirichletBC(geomtime, funcbc, lambda _, on_boundary: on_boundary)
ic = dde.IC(geomtime, funcic, lambda _, on_initial: on_initial)
# defining the data in deepxde format
data = dde.data.TimePDE(
geomtime,
pde,
[bc, ic],
anchors = train_x,
solution = solution
)
# defining the model
layer_size = [3] + [32] * 3 + [1]
activation = "tanh"
initializer = "Glorot uniform"
net = dde.maps.FNN(layer_size, activation, initializer)
# In the first time-step, just store the model. In the following ones, first restre, then train and then store again.
if time_step == 0:
model = dde.Model(data, net)
model.compile("adam", lr=0.001,metrics=["l2 relative error"])
losshistory, train_state = model.train(epochs=10000, model_save_path = "model/model.ckpt")
print(time_step)
else:
model = dde.Model(data, net)
model.compile("adam", lr=0.001,metrics=["l2 relative error"])
losshistory, train_state = model.train(epochs=10000, model_restore_path= "model/model.ckpt-10000")
print(time_step)
The issue may come from the for loop. Use dde.apply
in the for loop, see an example at https://github.com/lululxvi/deep-learning-for-indentation/blob/master/src/nn.py
Dear @lululxvi, I could not find the apply method in deepxde to run the model in for loop. I am facing error to restore the model in a for loop.
dde.utils.apply
Dear Lu, I have another puzzle when running the code. Let's say we sample 50 points along the boundary, where
bcs = [bc_rectX,bc_rectY, bc_circleX, bc_circleY] data = dde.data.PDE(geom, pde, bcs, num_domain=300, num_boundary=50, num_test=20)
When the code is running, it always issue a warning: Warning: 50 points required, but 100 points sampled. Uniform random is not guaranteed. Warning: CSGDifference.uniform_points not implemented. Use random_points instead.If I set the bounary points=100, it will sample 200 points randomly rather than uniformly. What's the reason causing this problem?