google / aiyprojects-raspbian

API libraries, samples, and system images for AIY Projects (Voice Kit and Vision Kit)
https://aiyprojects.withgoogle.com/
Apache License 2.0
1.62k stars 694 forks source link

Embedded_ssd_mobilenet_v1 model pretrained on COCO #446

Open giacomobartoli opened 6 years ago

giacomobartoli commented 6 years ago

Hi, Where can I find a pre-trained model over COCO dataset with embedded_ssd_mobilenet as configuration? That would be essential for transfer learning and it would be useful not only to me, but to the entire community. This is the list of all the pre-trained models: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md Embedded_ssd_mobilenet_v1 (the only one that can be deployed on the Vision Kit for object detection) is missing.

algila commented 6 years ago

Is this what you need ?

http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_2017_11_17.tar.gz

this is probably 300x300 depth_multiplier: 1.0 not working in Vision kit

giacomobartoli commented 6 years ago

@algila thank you anyway, but that model is not suitable for the Vision Kit as you stated. There is no pre-trained model on coco with embedded_ssd_mobilenet_v1. I'm training this model from scratch and when the evaluation will be good enough (this will probably requires some days yet) I will be happy to share with you and anyone who need it.

maylownaise commented 6 years ago

Is this what you're looking for? https://cogint.ai/custom-vision-training-on-the-aiy-vision-kit/?

It's a pre-trained model based on 256 x 0.125 (search for 'shared a trained model' for the link to the model). Just follow the instructions to change the embedded_ssd_mobilenet_v1_coco.config file.

I also needed to add this line: 'from_detection_checkpoint: true' right after setting fine_tune_checkpoint.

Let me know if it works for you.

giacomobartoli commented 6 years ago

@maylownaise that model is pre-trained on VOC, not coco.

maylownaise commented 6 years ago

@giacomobartoli As I'm very new to this, what is the difference? Thank you.

giacomobartoli commented 6 years ago

@maylownaise VOC dataset contains only 20 classes, while COCO has 90 classes. This means that performances will be better starting from a pre-trained model on COCO. I hope I was clear enough :-)

maylownaise commented 6 years ago

@giacomobartoli Yes, clear. Please let me know if you find one. Trying to work on Object Detection on the AIY Vision Kit. Practicing with a deck of cards and not having great results with the model I mentioned above. Thanks.

weiran-work commented 6 years ago

@giacomobartoli Unfortunately, there's no COCO based pretrained checkpoints at the moment. The closest thing you can get at the moment is the one @maylownaise pointed to.

giacomobartoli commented 6 years ago

@weiranzhao thank you for your feedback. Right now what I am doing is to train a model from scratch on coco dataset. I am facing some issues.. it seems training a network from scratch it really takes a lot, a lot of iterations. I'm about 500'000 iteration and the eval is just 8%. Too low. As soon as I get a better evaluation (60-70%), I will export the checkpoints and publish them on this thread.

maylownaise commented 6 years ago

@weiranzhao Is the limitation of the 256 x 256 hardware-based or software based? Do you think in the future, there will be software updates to allow object detection on larger images?

Thanks.

weiran-work commented 6 years ago

@maylownaise This limit mainly comes from hardware, because there's only 2MB of FastMemory on the chip. This can be alleviated through software, but that requires non trivial engineering work. There will be software updates to improve vision kit, but we don't have concrete timeline to let our inference engine to support larger than 256x256 input yet. Also, model input size is not the only factor that is limited by hardware, model structure, depthwise multiplier are limited as well.

giacomobartoli commented 6 years ago

Hey folks, this is the pre-trained model on coco. Unfortunately, it performs poorly: after 752k iterations, I got 8% eval. https://drive.google.com/open?id=17wxgzavz5Awy6MQ3GHsJrQBf3neEUpdA

Right now I am trying to re-train the network from scratch again, changing some hyper-parameters. If I would reach better performances I will update this post.

screen shot 2018-08-16 at 12 15 41
algila commented 6 years ago

Hi @giacomobartoli , may you kindly upload also the config file you used ? I would try to improve a little bit your score. Did you inizialized the weights randomly at the beginning ?

giacomobartoli commented 6 years ago

I used the default configuration file. Anyway, I'll look through the original papers: without a pre-trained model on Imagenet it is almost impossible to obtain good results for it.

gtreen commented 5 years ago

I'm trying to do the same thing, but with Pascal VOC (the one included in the AIY kit only finds people, cats and dogs and I want all 20 classes). Using their instructions, it's extremely slow. After about 400k iterations, I had the following:

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.126 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.252 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.108 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.012 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.186 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.191 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.266 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.283 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.007 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.064 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.403

I wonder if you're right in that it's impossible to train because the base network is being trained on top of the SSD wrapper network. Know where we can find embedded Mobilenet base network trained on ImageNet?

giacomobartoli commented 5 years ago

You can find Mobilenet pre-trained on the official website. What is missing is embedded_ssd_mobilenet pretrained.

gtreen commented 5 years ago

Yes, the issue is that this specific version of MobileNet (256x256, 0.125 depth multiplier) has no pre-trained version available. I wish they would publish it; it obviously exists somewhere because their cat/dog/person detector was trained using it as a base.