Closed JanLin0817 closed 5 years ago
Hey, I'm glad the documentation has been helpful so far. Seems like there is an issue with the compression script/instructions I provided. From what I remember theCascadeFeatureFusion_*/AuxOutput/*
nodes should not be in the pruned output checkpoint.
Can you run python -m tensorflow.python.tools.inspect_checkpoint --file_name <YOU_PRUNED_CHECKPOINT_FILE>
and show me the result? I can try to check this out this weekend when I have some time. Feel free to also send me an email.
Hello, this is my inspect checkpoint result of my pruned checkpoint icnet_pruned.zip. Thanks for your reply.
I'm getting the exact same error. Is there an update on this yet?
I'm getting the exact same error. Is there an update on this yet?
when running compress.py, I have KeyError: 'Predictions/postrain/Conv2D' How did you solve this?
Edit : I retrained everything with only one GPU and the error is gone. I have the same issues now
@julienip @awiegersma @JanLin0817 Hey all, I am really sorry for the complete lack of reply on this thread for the last while. Totally my fault. I had been working for the last few months and had a huge lack of time and was also not able to contribute to any open source.
Now that I am finished, I went through everything and provided a major update through 42c6bbe. This should fix all the bugs with the compression script and also provides an update to the dataset and preprocessor builders.
@JanLin0817 The issue - which I completely forgot to document - is that the export script must be run before the compression script in order to generate Tensorflow checkpoints that do not have training nodes. In your specific case, the AuxOutput
node is only added during training and so it is missing from the prune config (and thus does not get pruned resulting in the shape mismatch). Removal of all the training nodes through the export script is required to make walking through the graph during compression simple. I have updated the documentation here so hopefully no one else runs into this issue.
@julienip In regards to the key-error, it looks like there were also some problems with the compression configs I had uploaded. I have also fixed this in the latest merge to master.
As a note - the variable names have changed in the latest update to avoid the weird Prediction/postrain
vsPrediction/pretrain
convention I had when naming the PSPNet and ICNet output nodes. This is reflected in the updated models in the Model Zoo. If you have your own older checkpoints and want to use the updated codebase, you can also rename the nodes in your Tensorflow Checkpoints like I did. I used a simple name conversion script to do this which I found here. I rename CascadeFeatureFusion_0
-> CascadeFeatureFusion
and *Predictions/postrain
-> Predictions/Conv
(for fine-tuning from PSPNEt, ICNet ignores all `Predictions` nodes)**.
I did some quick testing of the whole PSPNet/ICNet pipeline, but if you guys find any more bugs please let me know (or even submit a PR if you can). Sorry again for the frustration and please let me know if this helps. I will be quicker to reply now.
Hi, I follow the Documentation step by step, from training PSPNet to re-training ICNet . Everything works fine until the last step, When i re-train ICNet after compress ICNet , it shows the problem as below.
it seems like after ICNet get compress by filter=0.5, some layer in model can't match anymore. Or maybe this is an issue of tensorflow slim.