rmalav15 / DHSGAN

Official tensorflow implementation of "DHSGAN: An End to End Dehazing Network for Fog and Smoke"
https://link.springer.com/chapter/10.1007/978-3-030-20873-8_38
21 stars 7 forks source link

topological sort failed and error about model-170000 #1

Open dkrmsptlfk opened 4 years ago

dkrmsptlfk commented 4 years ago

Hi I'm learning about image dehazing so I was testing your code. The open dataset was too many so I used only 500 image pairs(hazy and real image) in RESIDEv0-SOTS_indoor (gt folder and hazy folder). I copied and pasted the real images 10 times each to make real-hazy image pairs and then when I perform the code in train DHSGAN_generator.sh, there were errors about topological sort failed like this:

Optimization starts!!! 2019-08-08 15:38:28.198116: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:704] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order. 2019-08-08 15:38:28.271352: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:704] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order. 2019-08-08 15:38:28.646927: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:704] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order. 2019-08-08 15:38:28.701832: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:704] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.

However, waiting few minutes, optimization started well and it worked well.

The real problem was happened when i run the train_DHSGAN.sh in the train_DHSGAN.sh code, checkpoint is model-170000 but after running train_DHSGAN_generator.sh, there were errors like this:

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key discriminator/discriminator_unit/dense_layer_1/dense/bias not found in checkpoint [[node save_1/RestoreV2 (defined at main.py:224) ]]

I changed the model number from 170000 to 200000(the max iter) but the same error occurred

To sum up, 1) Is the problem about model-170000 due to topological sort failed error? 2) If 1) is not, how should i set the check point model?

I'm using python3.6.8, tensorflow gpu 1.13.1, CUDA10.0 and cudnn7.6.0 in windows10

Thank you

rmalav15 commented 4 years ago

It may be a problem with tf or cudnn version. https://github.com/alexlee-gk/video_prediction/issues/9

I will check and let you know.