pjreddie / darknet

Convolutional Neural Networks
http://pjreddie.com/darknet/
Other
25.71k stars 21.32k forks source link

training darknet with groups #1150

Open DRACOyu opened 5 years ago

DRACOyu commented 5 years ago

i try to use group convolution to speed up the operation,this is my cfg file,this can train ,but When I tested, there was no output。hope to get everyone's help

I added a group to each set of convolutions, I don't know if this is correct.but the size of the model is half of the original,who can help me ,thanks

[convolutional] batch_normalize=1 filters=16 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 groups=2 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=0 groups=2 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 groups=2 activation=leaky

[maxpool] size=2 stride=1

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 groups=2 activation=leaky

[maxpool] size=2 stride=1

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 groups=2 activation=leaky

agoodman9527 commented 5 years ago

@DRACOyu hi handsome, I met the same problem.Did you solve this?

developer0hye commented 5 years ago

@alowmen hi handsome. Did you solve this?

GallonDeng commented 5 years ago

there is still no group convolution or mobilenet implementation in the darknet(yolov3)?

JHLee0513 commented 5 years ago

@kammirzazad Is there a possible way to fix that? I would really like to get it to work but I have no idea what's going on in the backend 😞

chenbohua3 commented 5 years ago

@kammirzazad could you please give more detailed clues for the bugs? I have checked the codes about group convolution several times but did not found any problems. I have tested the speed of mobilenetV2 based yolov3 and find the speed is very slow. It takes about 20ms for one depthwise conv layer with cuda and cudnn, but when I switch off cudnn and use cuda only, it only takes 8ms. I do not know where is the problem.

kammirzazad commented 5 years ago

@chenbohua3 I am not familiar with mobilenetV2, does it use filter grouping? The bug I was talking about would mainly affect the functionality, so I am not sure how it could change the latency.

chenbohua3 commented 5 years ago

@kammirzazad I have figured out that the high latency is caused by the wrong conv algorithm choosed by cudnn when there is group convolutions in the networks. Yes, MobilenetV2 uses group convolutions in an extreme way, that is, the number of filter groups is equal to the number of input channels. I still don't understand the bug you talked about above. I have trained MobilenetV2(with group convolutions) in pytorch and import the weights to the darknet framework. Judging from the results, it works normally.

kammirzazad commented 5 years ago

@chenbohua3 Can you share your .cfg? Is "groups" parameter set to number of input channels?

chenbohua3 commented 5 years ago

@kammirzazad Sorry for the late reply, I have not been on github these days:( Here is my .cfg link. Some conv layers of backbone and head structure have "groups" set to number of input channels.

kammirzazad commented 5 years ago

@chenbohua3 I see, most likely I misunderstood the implementation of filter groups.

chenbohua3 commented 5 years ago

@kammirzazad :)

cuixing158 commented 5 years ago

@chenbohua3 hello, i have seen your mobilenetv2 cfg file, in your file you set "activation=relu6", darknet framework could support this 'relu6' ? another question is that MobilenetV2-yolov3 are faster than darknet-yolov3? thank you! in these days, i want to use mobilenetV2-yolov3,but i do't know perfermance

chenbohua3 commented 5 years ago

@cuixing158 yes, you need add relu6 to the framework (just do like relu which has already implemented in the framework) As for the speed, it does perform faster than the original one. However, since DW Conv is not optimized well by CuDnn, It is not theoretically fast.