questions about QAT - Githubissues

maincold2 / FFNeRV

FFNeRV: Flow-Guided Frame-Wise Neural Representations for Videos

MIT License

81 stars 1 forks source link

questions about QAT #4

Closed James89045 closed 10 months ago

James89045 commented 1 year ago

I wanna ask that why the quantization weight parts in your model have this limit: assert self.wbit <= 8 or self.wbit == 32, if I want to use 16bits to quantize weight, can I change the limit to assert self.wbit <= 32? thank you!

maincold2 commented 1 year ago

It is okay to change like that, but I recommend using half (float16) tensors for 16bit implementation.

Thank you for your interest!