Open Pierrci opened 4 years ago
@Pierrci is this closing now as 34210 is closed?
The non-quantized TFLite version is around 1GB so way too big for a mobile app. I'll close this once the FP16 quantization works so we can use a model with a reduced size and good performance, but it's not the case for now (at least when I tried last Friday)
Hi @Pierrci, were you able to quantize the BERT for TFLite? I tried a few options but failed to get the quantized model.
Since the tokenizer is the same than MobileBERT/DistilBERT, would be pretty straightforward to add once this TensorFlow issue is solved: https://github.com/tensorflow/tensorflow/issues/34210