如何转换成fp16或者int8类型的

ygfrancois / crnn.pytorch.tensorrt.chinese

A Chinese characters recognition repository with tensorrt format supported based on CRNN_Chinese_Characters_Rec and TensorRTx.

16 stars 6 forks source link

如何转换成fp16或者int8类型的 #1

Open zhong-xin opened 3 years ago

zhong-xin commented 3 years ago

如果想转换成trt支持的fp16或者Int8类型，应该如何修改。

ygfrancois commented 3 years ago

如果想转换成trt支持的fp16或者Int8类型，应该如何修改。

目前默认是转成fp16的，见crnn_trt/crnn_number.cpp 里的 #define USE_FP16 （如果不想转，把这一行去掉就行），目前转INT8没有支持，等我弄一下

zhong-xin commented 3 years ago

好的了解了，感谢你分享的代码，对我很有帮助。

ygfrancois commented 3 years ago

好的了解了，感谢你分享的代码，对我很有帮助。

感谢关注，等我把int8弄好了告诉你。

vllsm commented 3 years ago

Does it also work for traditional chinese?

zhong-xin commented 3 years ago

@ygfrancois 输入图片32*100，直接使用pytorch推理时间为14.46ms，使用fp32量化是6.69ms，使用fp16量化是6.32ms。为什么fp32和fp16量化的差异这么小，这是否正常。

ygfrancois commented 3 years ago

Does it also work for traditional chinese?

the network of course yes, but the pretrained weights is not supported for the traditional chinese, you need to train with the traditional chinese dataset by yourself

ygfrancois commented 3 years ago

@ygfrancois 输入图片32*100，直接使用pytorch推理时间为14.46ms，使用fp32量化是6.69ms，使用fp16量化是6.32ms。为什么fp32和fp16量化的差异这么小，这是否正常。

你用的显卡是2080ti吗？显存减少明显吗？我估计和硬件或者cudnn的内部实现有关

zhong-xin commented 3 years ago

@ygfrancois 输入图片32*100，直接使用pytorch推理时间为14.46ms，使用fp32量化是6.69ms，使用fp16量化是6.32ms。为什么fp32和fp16量化的差异这么小，这是否正常。

你用的显卡是2080ti吗？显存减少明显吗？我估计和硬件或者cudnn的内部实现有关

用的是TX2，pytorch模型占用显存854M，fp32量化后是1G，fp16量化后是726M。