Open Stonebobo opened 4 years ago
Q1: The tensorflow version, the memory space, or the graph will cause this problem. You can solve this problem by referring to https://blog.csdn.net/u010327061/article/details/84078583. Q2: The version of the tensorflow is 1.12 with cuda 9.0. Q3: How much training samples do you adopt and how to set the learning rate?
Tank you for your reply. The number of my training set is 3600(481*321) from Rain100L_new_version of CVPR 2017 the learning rate has not changed, the original value in your code is used--start_learning_rate = 5e-4# Q3 may be caused by the data set is too small than you paper used. thank you!I will try again about Q1.
Hello, I want to ask some questions.
The model I trained before loading in the training set does not have problems, but loading the trained model in the test set will report an error:(我在train_MSPFN导入上次训练结果不报错,但是在testMSPFN上导入训练结果就会报错) During handling of the above exception, another exception occurred:a Variable name or other graph key that is missing_ detail are as follows: Traceback (most recent call last): File "E:/bwl_python/MSPFN-me-7.3/model/test/test_MSPFN.py", line 48, in
saver.restore(sess, '../MSPFN/epoch6')#93
File "E:\anaconda\path\envs\bwltfgpu\lib\site-packages\tensorflow\python\training\saver.py", line 1302, in restore
err, "a Variable name or other graph key that is missing")
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
2 root error(s) found.
(0) Not found: Key generator/BCM2_0/down2_1/alpha not found in checkpoint
[[node save/RestoreV2 (defined at /bwl_python/MSPFN-me-7.3/model/test/test_MSPFN.py:47) ]]
[[save/RestoreV2/_453]]
(1) Not found: Key generator/BCM2_0/down2_1/alpha not found in checkpoint
[[node save/RestoreV2 (defined at /bwl_python/MSPFN-me-7.3/model/test/test_MSPFN.py:47) ]]
0 successful operations.
0 derived errors ignored.
In addition, what is the specific version of your tensorflow? 1.1? 1.14? (Ps, I have some version errors when using 1.1),such as AttributeError: module 'tensorflow' has no attribute 'AUTO_REUSE' when I use the tensorflow1.1
when the epoch=5(batch size=12,input_image is 480*320),the train_loss and the edge_loss are not change. train_loss=0.00105,the edge_loss=0.0010004.........
looking forward to your reply, thank you!