Quantizing YOLO weights

arturioxas commented 6 years ago

Looking for a way to quantize YOLO weights (to 8-bits or 16 bits). My idea is to speed up calculations as much as possible without hurting accuracy too much so I would like to experiment with that to find out if that would hurt accuracy a lot. Could you give me some advice of what there could be done? I searched for material but wasn't succesful.

pjreddie commented 6 years ago

The easy option is simply to quantize them after training. Take a trained weight file and truncate them all to the number of bits you want. A better option would be some form of projected gradient descent where you train with normal floats but truncate them on the forward pass so that the network learns a good representation that is also quantized to the level you want (this is what we did in the xnor paper).

That being said, actually getting speed up from this can be tough. On new GPUs 16bit floats are faster than 32 bit but other than that it would take some serious low-level hacking to realize any significant gains, modern GPUs and CPUs are heavily specialized toward floating point computation.

TaihuLight commented 6 years ago

@pjreddie Can you get the demo of quantize YOLO weights with 16-bit in your darkent?

arturioxas commented 6 years ago

Can you also note good tool for easily truncating weights before closing this question?

pjreddie / darknet

Quantizing YOLO weights #556