zhou13 / lcnn

LCNN: End-to-End Wireframe Parsing
MIT License
494 stars 94 forks source link

TensorBoard caught SIGTERM; exiting... #29

Closed jwngo closed 4 years ago

jwngo commented 4 years ago

Hi Yichao,

Thanks for sharing your code!

While training the model, I get the following error

TensorBoard caught SIGTERM; exiting...

I have tried training another model from scratch as well, so now I have two models but both of them would stop running at the same iteration; 023/0480k, which is not enough to reproduce the pre-trained model of 000312000 in logs/*/npz (only saved up to 000160000)

Do you have any idea what is causing this error? I tried googling to no avail.

Thank you Yichao.

jwngo commented 4 years ago

So I think that it is due to the max_epoch being 24, a very simple thing that I missed.....

Testing it right now.

sachinkaundal commented 3 years ago

@jwngo, can you please tell what would be max_epoch size to train model(successful)?

zhou13 commented 3 years ago

@sachinkaundal You can use any value you want. After reaching max_epoch, tensorboard will just automatically close.

sachinkaundal commented 3 years ago

@zhou13 thank you for your response, Before reaching max_epoch, i am getting above message (suppose, max_epoch=20, when it reaches 19 )

"TensorBoard caught SIGTERM; exiting..."

zhou13 commented 3 years ago

Why is that not expected?

sachinkaundal commented 3 years ago

@zhou13, means this is the final response we will get at the end of training or something else. If yes, then i am getting only logs in /logs directory but not checkpoints which are required for testing. Thanks for response in advance.. ./process.py config/wireframe.yaml data/wireframe logs/pretrained-model/npz/000312000

zhou13 commented 3 years ago

I don't really understand your question but you might want to adjust io.validation_interval in the setting.