I am trying to develop materials for mobile developers to provide a way for them to compare BERT-based models that are specifically developed for mobile deployments. Currently, I have chosen MobileBERT and DistilBERT for that (repository).
Here's what I have done so far -
Fine-tune DistilBERT on the SST-2 dataset (text classification) (Kaggle Kernel).
Generate dynamic-range and float16 quantized TensorFlow Lite models (Kaggle Kernel).
Evaluate the TensorFlow Lite models on the SST-2 development set (Notebook).
Surprisingly, the TensorFlow Lite models achieve a random performance (~50% accuracy) on the development set. This is sharply in contrast with the original fine-tuned model performance (accuracy) which is about ~90%.
I am wondering if I am missing out on something. Any pointers would be really helpful.
I am trying to develop materials for mobile developers to provide a way for them to compare BERT-based models that are specifically developed for mobile deployments. Currently, I have chosen MobileBERT and DistilBERT for that (repository).
Here's what I have done so far -
float16
quantized TensorFlow Lite models (Kaggle Kernel).Surprisingly, the TensorFlow Lite models achieve a random performance (~50% accuracy) on the development set. This is sharply in contrast with the original fine-tuned model performance (accuracy) which is about ~90%.
I am wondering if I am missing out on something. Any pointers would be really helpful.
Cc: @khanhlvg