isobar-us / multilabel-image-classification-tensorflow

MIT License
46 stars 25 forks source link

How to restore last checkpoint and continue training? #1

Open acg93-pixel opened 5 years ago

acg93-pixel commented 5 years ago

Hi, great project. I was wandering is there a way to restore last given checkpoint and continue training without downloading module from TFhub? Can you give some example for Mobilenet maybe?

Thanks

rhossei2 commented 5 years ago

Downloading TFhub feature vector module builds on a pretrained ImageNet model (ex. mobilenet_v2_140_224) and is used to produce the bottleneck files of your images needed for the training. The training script looks for checkpoint files at /tmp/_retrain_checkpoint (ex. _retrain_checkpoint.data-00000-of-00001, _retrain_checkpoint.meta, _retrain_checkpoint.index, etc.).

Keep in mind that this project requires you to always provide a feature vector module before the start of training. We used this module for MobileNet https://tfhub.dev/google/imagenet/mobilenet_v2_140_224/feature_vector/2

acg93-pixel commented 5 years ago

Thank you for your answer. Still, when I run Tensorboard, I'm unable to see the extension of my previous training. For example: I run the retrain.py script with Mobilenet feature vector for 1000 steps. When I re-run the script for another 1000 steps, there's no progress, i only see the last 1000 steps of my training. I haven't seen that part in the code where you say the script is looking for the last checkpoint.

It seems to me like the second training begins again from step 0, it doesn't contnue from the step 1000 like I thought so.

I suppose I asked the wrong question. I was wondering how to continue my traning from my last saved checkpoint? First time I want to train up to 1000 steps and the second time retrain from 1000 to 2000 steps for example. I hope you can help me with this, thanks in advance:)

rhossei2 commented 5 years ago

Ah ok thanks for clarifying your question. Let me test that out and get back to you.

acg93-pixel commented 5 years ago

Any updates on this matter? Thanks:)

rhossei2 commented 5 years ago

Sorry for the delay! we're planning to implement this eventually but currently all of our resources are allocated to our object detection project https://github.com/isobar-us/multilabel-image-classification-tensorflow/tree/master/tf-object-detection-sagemaker. This project can detect multiple objects with boundary boxes and can initialize from a checkpoint. Hopefully this helps you in the meantime.