jay-mahadeokar / pynetbuilder

pyNetBuilder is a modular pytonic interface with builtin modules for generating popular caffe prototxt network file definitions.
BSD 2-Clause "Simplified" License
328 stars 140 forks source link

ssd+squeezenet #1

Open kaishijeng opened 8 years ago

kaishijeng commented 8 years ago

Jay

Do you have a plan to generate a builder for ssd+squeezenet? I am looking for a low computational complexity of SSD detector and am thinking ssd+squeezenet may be a good compromise between accuracy and speed.

Thanks,

jay-mahadeokar commented 8 years ago

@kaishijeng I believe the complexity of squeezenet in terms of flops is ~800 Million (though not sure, need to run it through complexity module) and the corresponding top1 accuracy on imagenet is ~58%, its advantage is lesser no of params (which affects memory and not speed). In comparison, thin resnet 50 (or resnet_50_1by2) which I trained has ~10k M flops with top1 accuracy of 66.79 on imagenet. See this comparison table. I had run experiment to train resnet_50_1by2 with SSD and got around 64-65% mAP on voc dataset, as compared to 70.4 using full resnet 50 described here. If you want even faster network (and not smaller in size), I suppose using tweaked resnet variants could be useful.
That said, it would be interesting to see how squeezenet can be used as base network for SSD (which layers /feature maps to use etc). There is a quick guide on how it can be done.

kaishijeng commented 8 years ago

Jay,

Thanks for the info about squeezenet vs resnet50. My understanding is squeezenet is faster than alexnet and also has smaller size of parameters.

Do you have speed comparison between ssd+vgg16 vs ssd+resnet50? Can you share pretrained models of ssd+resnet_50 or ssd+resnet_50_1by2? I will try to train ssd+resnet_50 this weekend.

Thanks,

On Mon, Aug 8, 2016 at 11:12 PM, Jay Mahadeokar notifications@github.com wrote:

@kaishijeng https://github.com/kaishijeng I believe the complexity of squeezenet in terms of flops is ~800 Million (though not sure, need to run it through complexity module) and the corresponding top1 accuracy on imagenet is ~58%, its advantage is lesser no of params (which affects memory and not speed). In comparison, thin resnet 50 (or resnet_50_1by2) which I trained has ~10k M flops with top1 accuracy of 66.79 on imagenet. See this (comparison table)[https://github.com/jay- mahadeokar/pynetbuilder/tree/master/models/imagenet#basic- residual-network-results]. I had run experiment to train resnet_50_1by2 with SSD and got around 64% mAP on voc dataset, as compared to 70.4 using full resnet 50 described here https://github.com/jay-mahadeokar/pynetbuilder/tree/master/models/voc2007_ssd. If you want even faster network (and not smaller in size), I suppose using tweaked resnet variants could be useful.

That said, it would be interesting to see how squeezenet can be used as base network for SSD (which layers /feature maps to use etc). This is a quick guide on how it can be done: https://github.com/jay- mahadeokar/pynetbuilder/tree/master/models/voc2007_ssd# building-other-detection-networks

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jay-mahadeokar/pynetbuilder/issues/1#issuecomment-238462800, or mute the thread https://github.com/notifications/unsubscribe-auth/AMGg3lF5HJ3q6Ct0lSrDRbAHPHmXpP7Zks5qeBo2gaJpZM4Jfvjr .

jay-mahadeokar commented 8 years ago

Please refer this table for ssd+vgg16 and ssd+resnet50. I have also shared the caffemodel.. This table also compares resnet 50 and resnet_50_1by2. Though I havent yet added model files object detection using for resnet_50_1by2 + ssd, it should be easy to train it (since I have added the model pre-trained on imagenet). Let me know if the training ssd+resnet doc is sufficient, or you run into any bugs.

kaishijeng commented 8 years ago

According to your table, ssd+resnet50 shpould be 2 or 3 times faster than ssd+vgg16. Is this what you have observed?

Thanks,

On Tue, Aug 9, 2016 at 12:05 AM, Jay Mahadeokar notifications@github.com wrote:

Please refer this table https://github.com/jay-mahadeokar/pynetbuilder/tree/master/models/voc2007_ssd#comparing-vgg-and-resnet-50-ssd-based-detection-networks for ssd+vgg16 and ssd+resnet50. I have also shared the caffemodel.. This table https://github.com/jay-mahadeokar/pynetbuilder/tree/master/models/imagenet#basic-residual-network-results also compares resnet 50 and resnet_50_1by2. Though I havent yet added model files object detection using for resnet_50_1by2 + ssd, it should be easy to train it (since I have added the model pre-trained on imagenet). Let me know if the training ssd+resnet https://github.com/jay-mahadeokar/pynetbuilder/tree/master/models/voc2007_ssd doc is sufficient, or you run into any bugs.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jay-mahadeokar/pynetbuilder/issues/1#issuecomment-238471047, or mute the thread https://github.com/notifications/unsubscribe-auth/AMGg3qzyLqGRvaVWlLOx1JXRJ2ENOe76ks5qeCaigaJpZM4Jfvjr .

jay-mahadeokar commented 8 years ago

I haven't done thorough benchmarking on cpu, since I only tested validation set on gpu machines. but I guess that should be true! I will run it on CPU and will update here.

kaishijeng commented 8 years ago

Jay

No need to benchmark on CPU because I have a GPU, TitanX. What parameters do I need to use with create_ssdnet.py to create ssd+resnet50_1by2 instead of ssd+resnet50?

python app/ssd/create_ssdnet.py --type Resnet -n 256 -b 3 4 6 3 --no-fc_layers -m bottleneck --extra_blocks 3 3 --extra_num_outputs 2048 2048 --mbox_source_layers relu_stage1_block3 relu_stage2_block5 relu_stage3_block2 relu_stage4_block2 relu_stage5_block2 pool_last --extra_layer_attach pool -c 21 -o ./

Thanks,

jay-mahadeokar commented 8 years ago

--extra_num_outputs could be reduced to 1024 1024, and -n to 128. Rest of the params should remain same I think. Use -h for more help on params.

jay-mahadeokar commented 8 years ago

@kaishijeng did the above params work for you? I am closing this for now, feel free to re-open it if you have additional questions.

kaishijeng commented 8 years ago

Jay,

Yes, it works

Thanks

kaishijeng commented 8 years ago

Jay,

I am able to train ssd_resent50 and ssd_resnet50_1by2 and try out inference on TitanX and Jetson TX1. For TitanX, I can see the speed improvement, but not much difference on Jetson TX1. I think that it is due to memory bandwidth because of parameter size. If it is not much effort for you to create ssd_squeezenet, I can do the training and measure inference time on TitanX and Jetson TX1.

Thanks,

jay-mahadeokar commented 8 years ago

@kaishijeng Squeezenet architecture is a lot different than resnet / vgg in terms of feature map sizes. I am not sure which layers would we attach the detection heads.

If you want to try some experiments, id suggest:

Please give it a try and I can help out if you have any further questions.

kaishijeng commented 8 years ago

Jay,

 It looks like not s simple exercise to create a ssd+squeezenet network. So  I like to try ssd+resnet18 first. I need to train resnet18 imagenet first and use it a pretained model for ssd+resnet18 training.. 

I plan to use the following command to create resnet18 for imagenet , but not sure the parameters are correct or not. Can you help me to check it:

python app/imagenet/build_resnet.py -m bottleneck -b 2 2 2 2 -n 256 --no-fc_layers -o ./

Also I got an error to use the following command to generate ssd+resnet18. Do you know which parameters are incorrect? python app/ssd/create_ssdnet.py --type Resnet -n 256 -b 2 2 2 2 --no-fc_layers -m bottleneck --extra_blocks 3 3 --extra_num_outputs 2048 2048 --mbox_source_layers relu_stage1_block3 relu_stage2_block5 relu_stage3_block2 relu_stage4_block2 relu_stage5_block2 pool_last --extra_layer_attach pool -c 21 -o ./

Thanks,

jay-mahadeokar commented 8 years ago

Sounds good!

You need to modify relu_stage1_block3 relu_stage2_block5 relu_stage3_block2 relu_stage4_block2 relu_stage5_block2 params to relu_stage1_block1 relu_stage2_block1 relu_stage3_block1 relu_stage4_block1 relu_stage5_block1

Also, extra_blocks could be 2 2 (or your choice, more blocks will increase runtime). Notice that resnet 18 has only 2 blocks in each stage (index starts with 0). Read more here

kaishijeng commented 8 years ago

Jay

Shouldn't main_branch of resnet18 of imagenet be normal instead of bottleneck? If yes, I use the following command to generate resnet18, there is an error. python app/imagenet/build_resnet.py -m normal -b 2 2 2 2 -n 256 --no-fc_layers -o ./

The error is:

F0814 01:05:54.747488 14253 eltwise_layer.cpp:34] Check failed: bottom[i]->shape() == bottom[0]->shape() * Check failure stack trace: * Aborted (core dumped)

Thanks,

On Sun, Aug 14, 2016 at 12:41 AM, Jay Mahadeokar notifications@github.com wrote:

Sounds good!

You need to modify relu_stage1_block3 relu_stage2_block5 relu_stage3_block2 relu_stage4_block2 relu_stage5_block2 params to relu_stage1_block1 relu_stage2_block1 relu_stage3_block1 relu_stage4_block1 relu_stage5_block1

Also, extra_blocks could be 2 2 (or your choice, more blocks will increase runtime). Notice that resnet 18 has only 2 blocks in each stage (index starts with 0). Read more here https://github.com/jay-mahadeokar/pynetbuilder/tree/master/models/imagenet#creating-residual-networks

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jay-mahadeokar/pynetbuilder/issues/1#issuecomment-239660557, or mute the thread https://github.com/notifications/unsubscribe-auth/AMGg3iqkohWZrbrfXIMAxCpVa8Sh-t8Aks5qfsbBgaJpZM4Jfvjr .

jay-mahadeokar commented 8 years ago

Please specify -n as 64. Note that the bottleneck block has 3 layers 64,64,256 filters, whereas normal block has 2 layers with 64,64 filters. Since 1st conv layer has 64 filters, it gives error. I should do this check somewhere!

FYI, resnet_18 has:

python app/imagenet/build_resnet.py -m normal -b 2 2 2 2 -n 64 --no-fc_layers -o ./
Number of params:  11.688512  Million
Number of flops:  1814.082944  Million

The flops is larger than resnet_50_1by2. Not sure if it will be faster, but I havent benchmarked.

poorfriend commented 8 years ago

@kaishijeng, can you tell me how many times the speed of ssd+resnet50 is on the ssd+vgg16 using a GPU, TitanX. Thank you

MisayaZ commented 8 years ago

@kaishijeng , hi, I have test the Benchmarking by command line caffe time and found that the forward time of ssd+resnet50 is more than the forward time of ssd+vgg16. I do not how you see the speed improvement?

kaishijeng commented 8 years ago

MisayaZ,

  Your data is correct. This has been a while since I did the test last

time. My impression is ssd+resnet50 is slower to ssd+vgg16. By ssd+resnet50-1by2 is slightly faster than ssd+vgg16, but lower memory footprint

On Wed, Nov 2, 2016 at 12:13 AM, MisayaZ notifications@github.com wrote:

kaishijeng , hi, I have test the of by the Benchmarking by command line caffe time and found that the forward time of ssd+resnet50 is more than the forward time of ssd+vgg16. I do not how you see the speed improvement?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jay-mahadeokar/pynetbuilder/issues/1#issuecomment-257789287, or mute the thread https://github.com/notifications/unsubscribe-auth/AMGg3tZ55OuXK42iRlVzDJ2veWiSFCHAks5q6DgIgaJpZM4Jfvjr .

mrgloom commented 8 years ago

SqeezeNet is not fast (compare to AlexNet), it just have small on disk size. See table https://github.com/mrgloom/kaggle-dogs-vs-cats-solution

KevinYuk commented 7 years ago

@kaishijeng Hi kaishijeng,

Have you successfully build the resnet18+SSD and get a good mAP? If so, could you please share your related resnet18+SSD prototxt file and resnet18 pre-train weights? Thanks a lot.