Low precision inference?

akshatd commented 7 years ago

Hi! Are there plans for making a low precision inference mode like many other neural network frameworks out there? Would be really helpful for embedded applications where we have very limited memory!

adroit91 commented 7 years ago

There is "binary" and "xnor" mentioned internally in the convolutional layer files. If you can make them to work, that is.

akshatd commented 7 years ago

I was thinking more on the level of the whole thing being reduced to 8 bit like in tensorflow, with weight conversion and 8-bit operators for each layer: https://www.tensorflow.org/performance/quantization. I think this is a very natural path for this framework, as almost everyone is trying to do this to extract more performance out of their hardware. If I had to do this myself, could anyone suggest how i could start?

adroit91 commented 7 years ago

The correct positions where @pjreddie does binarization is probably where you could insert your quantization instead, and then proceed with GEMM, etc. You will have to write your own low precision version of the same functions, as the current code isn't templated. Recommended checking out libraries like tinyDNN to see if they already have some implementation for the same.

sivagnanamn commented 7 years ago

@adroit91 How to use binary and xnor options while training? I added xnor=1 and binary=1 in train cfg and started training, model got saved after 100 iterations but there was no reduction in size of the model. It'll be helpful if you could share sample cfg.

TaihuLight commented 7 years ago

In the Darknet Google Group(),the xnor_layer was removed at some point (https://github.com/pjreddie/darknet/commit/32d2c969973aa98635123743f321859192ff581d).The code at point (https://github.com/pjreddie/darknet/tree/8a6ba2fff3ee1c14bca0aa0e0a909aba7057cc94) can be complied and run successfully according to the users of the Group, but i cannot do it.

He was able to compile after following modifications in ai2.mk

GPU=1 CUDNN=0 OPENCV=1 DEBUG=0 AI2=1

ARCH= -gencode arch=compute_50,code=[sm_50,compute_50] \ -gencode arch=compute_52,code=[sm_52,compute_52]

COMMON=-D_FORCE_INLINES I modified the xyolo.test.cfg batch=1 subdivisions=1

Used yolo V1 training process. make -f ai2.mk

zrobotparking commented 6 years ago

@TaihuLight thanks for the information. I try to train xyolo.test.cfg with following code but can't finish training.

./darknet yolo train cfg/xyolo.test.cfg darknet.conv.weights

The training start but the result turn into a huge number and fail. Same situation on CPU and GPU training.

Buy the way, what is the purpose for the following code?

make -f ai2.mk

Is it the implementation for binary conv. ?

AlexeyAB commented 6 years ago

@zrobotparking Try to train this XNOR-model: tiny-yolo-obj_xnor.zip as common yolov2-tiny.cfg: ./darknet detector train data/obj.data tiny-yolo-obj_xnor.cfg or ./darknet detector train data/obj.data tiny-yolo-obj_xnor.cfg tiny-yolo-voc.conv.13

So 1st and last conv-layers should be FP32, and others XNOR=1 (1-bit).

zrobotparking commented 6 years ago

@AlexeyAB Thanks a lot!! It successfully start training. I train the model under your darknet fork.

But why is the .weights file still cost 61Mbyte?

AlexeyAB commented 6 years ago

@zrobotparking After ~2000 iterations you will can successfully detect your objects using XNOR-model.

Also you should know, that optimized XNOR binary inference was removed from the Darknet. So although there is used 1-bit inference, it is executed using float-arithmetic, so detection will be slow. You should implement your own XNOR-backend for conv, maxpool, route, reorg, upsample layers. Or wait for somebody, who will make it :)

zrobotparking commented 6 years ago

@AlexeyAB
I tried to train tiny-yolo-obj_xnor.cfg on VOC & COCO. But after 2000 iterations, the loss increase and become nan. Same situation if I add xnor=1 to convolution layer in yolov3-tiny.cfg . If I use original tiny-yolo or tiny-yolov3 the training works perfect.

The cmd I used:

./darknet detector train data/voc.data tiny-yolo-obj_xnor.cfg ./darknet detector train data/voc.data yolov3-tiny-xnor.cfg

Maybe using pre-trained model is necessary?

AlexeyAB commented 6 years ago

@zrobotparking

Did you change classes=20 and filters=125 and anchors = 1.08,1.19, 3.42,4.41, 6.63,11.38, 9.42,5.11, 16.62,10.52 to train on Pascal VOC? You should use such cfg-file for training on VOC - try to use it: tiny-yolo-voc_xnor.zip

Try to get pre-trained weights as here: https://github.com/AlexeyAB/darknet/blob/cda8171feb76bcb405350fd8341d42a0300e2f4b/build/darknet/x64/partial.cmd#L9 ./darknet partial cfg/yolov2-tiny-voc.cfg yolov2-tiny-voc.weights yolov2-tiny-voc.conv.13 13

And train ./darknet detector train data/obj.data tiny-yolo-voc_xnor.cfg tiny-yolo-voc.conv.13

I didn't train it on Pascal VOC, but I trained it on my custom dataset, and it works with ~15% of mAP decreasing only.

If you will still get increasing of loss, try to use this GitHub repo: https://github.com/AlexeyAB/darknet

sunshinezhihuo commented 6 years ago

@AlexeyAB Hello, ① I just want to detect "person", that is, only one class. So I use person_val.txt in VOC datasets. And I set classes =1 in cfg/voc.data and set person in data/voc.names. Then I set classes =1 and some filters = 18 in cfg/yolov3.cfg. Do you think my method is correct? About background, is there need to add to classes? That is , classes includes ["person","background"] and set classes = 2.
② In yolov3.cfg, I set: batch =128 subdivisions =64 learning_rate = 0.0001. When I train the network, the loss decreases at the beginning. After some iterations, the loss becomes increased and then decreased . Should I change the learning_rate more smaller? Thank you for your reply.

zrobotparking commented 6 years ago

@AlexeyAB Thanks for your help! But I encounter CUDA error when I trird to train on tiny-yolo-voc_xnor.zip . The error happen on this repo and https://github.com/AlexeyAB/darknet I have opened an issue on your repo. https://github.com/AlexeyAB/darknet/issues/805 There are a lot of same problem in darknet's Google Group

AlexeyAB commented 6 years ago

@zrobotparking I fixed it in my repository: https://github.com/AlexeyAB/darknet

Thilanka97 commented 6 years ago

Do you guys know of any Binarized yolov3 implementation ? where I can get pre-trained weights for binarized yolov3 ? or would you guide me to implement binarized yolov3 ?

zrobotparking commented 6 years ago

@Thilanka97 Just add xnor=1 under [convolutional] tag in .cfg As example in @AlexeyAB given tiny-yolo-voc_xnor.zip

AlexeyAB commented 6 years ago

@Thilanka97 There is implemented acceleration on XNOR: https://github.com/AlexeyAB/darknet More info: https://github.com/AlexeyAB/darknet/issues/1472

Or use this repository for INT8-inference, BIT1-XNOR-inference: https://github.com/AlexeyAB/yolo2_light

Thilanka97 commented 5 years ago

@AlexeyAB hey, if we change the cfg file to xnor and train, then it will train yolov2 using binary weights and will return binary weights right? Am I right here ? I just want to know if the yolov2 xnor code is available. I mean an inplementation of yolov2(tiny) using xnor-net or binary BNN?

Thanks in advance!

pjreddie / darknet

Low precision inference? #81