Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.52k
stars
1.24k
forks
source link
Nano: Model.quantize does not calculate accuracy correctly #5305
The tune result is inconsistent with actual when set metric=tf.keras.metrics.SparseCategoricalAccuracy()
code:
import tensorflow as tf
from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2
import numpy as np
from bigdl.nano.tf.keras import Model
model = MobileNetV2(weights=None, input_shape=(40, 40, 3), classes=10)
model = Model(inputs=model.inputs, outputs=model.outputs)
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],)
train_examples = np.random.random((100, 40, 40, 3))
train_labels = np.random.randint(0, 10, size=(100,))
train_dataset = tf.data.Dataset.from_tensor_slices((train_examples, train_labels)).batch(8)
model.evaluate(train_dataset)
q_model = model.quantize(calib_dataset=train_dataset,
metric=tf.keras.metrics.SparseCategoricalAccuracy(),
tuning_strategy='basic',
accuracy_criterion={'relative': 0.99,
'higher_is_better': True})
m = tf.keras.metrics.SparseCategoricalAccuracy()
for img, label in train_dataset:
m.update_state(label, model(img))
print('#' * 100)
print("Accuracy: {}".format(m.result().numpy()))
output:
2022-08-04 00:46:04.003972: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
13/13 [==============================] - 2s 24ms/step - loss: 2.3026 - sparse_categorical_accuracy: 0.0800
...
2022-08-04 00:46:30 [INFO] Start to evaluate the TensorFlow model.
2022-08-04 00:46:30 [INFO] Model inference elapsed time: 678.92 ms
2022-08-04 00:46:30 [INFO] Tune 1 result is: [Accuracy (int8|fp32): 0.0000|0.0000, Duration (seconds) (int8|fp32): 0.6791|0.5165], Best tune result is: [Accuracy: 0.0000, Duration (seconds): 0.6791]
2022-08-04 00:46:30 [INFO] |**********************Tune Result Statistics**********************|
2022-08-04 00:46:30 [INFO] +--------------------+----------+---------------+------------------+
2022-08-04 00:46:30 [INFO] | Info Type | Baseline | Tune 1 result | Best tune result |
2022-08-04 00:46:30 [INFO] +--------------------+----------+---------------+------------------+
2022-08-04 00:46:30 [INFO] | Accuracy | 0.0000 | 0.0000 | 0.0000 |
2022-08-04 00:46:30 [INFO] | Duration (seconds) | 0.5165 | 0.6791 | 0.6791 |
2022-08-04 00:46:30 [INFO] +--------------------+----------+---------------+------------------+
2022-08-04 00:46:30 [INFO] Save tuning history to /home/projects/BigDL/nc_workspace/2022-08-04_00-46-14/./history.snapshot.
2022-08-04 00:46:30 [INFO] Specified timeout or max trials is reached! Found a quantized model which meet accuracy goal. Exit.
2022-08-04 00:46:30 [INFO] Save deploy yaml to /home/projects/BigDL/nc_workspace/2022-08-04_00-46-14/deploy.yaml
####################################################################################################
Accuracy: 0.07999999821186066
Works well using other accuracy metrics (e.g. CategoricalCrossentropy)
CategoricalCrossentropy:
import tensorflow as tf
from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2
from keras.utils.np_utils import to_categorical
import numpy as np
from bigdl.nano.tf.keras import Model
model = MobileNetV2(weights=None, input_shape=(40, 40, 3), classes=10)
model = Model(inputs=model.inputs, outputs=model.outputs)
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
loss=tf.keras.losses.CategoricalCrossentropy(),
metrics=[tf.keras.metrics.CategoricalAccuracy()],)
train_examples = np.random.random((100, 40, 40, 3))
train_labels = np.random.randint(0, 10, size=(100,))
train_labels = to_categorical(train_labels, num_classes=10)
train_dataset = tf.data.Dataset.from_tensor_slices((train_examples, train_labels)).batch(8)
model.evaluate(train_dataset)
q_model = model.quantize(calib_dataset=train_dataset,
metric=tf.keras.metrics.CategoricalAccuracy(),
tuning_strategy='basic',
accuracy_criterion={'relative': 0.99,
'higher_is_better': True})
m = tf.keras.metrics.CategoricalAccuracy()
for img, label in train_dataset:
m.update_state(label, model(img))
print('#' * 100)
print("Accuracy: {}".format(m.result().numpy()))
output:
2022-08-04 00:50:28.981507: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
13/13 [==============================] - 2s 19ms/step - loss: 2.3026 - categorical_accuracy: 0.0900
...
2022-08-04 00:50:55 [INFO] Start to evaluate the TensorFlow model.
2022-08-04 00:50:56 [INFO] Model inference elapsed time: 653.21 ms
2022-08-04 00:50:56 [INFO] Tune 1 result is: [Accuracy (int8|fp32): 0.0900|0.0900, Duration (seconds) (int8|fp32): 0.6534|0.4852], Best tune result is: [Accuracy: 0.0900, Duration (seconds): 0.6534]
2022-08-04 00:50:56 [INFO] |**********************Tune Result Statistics**********************|
2022-08-04 00:50:56 [INFO] +--------------------+----------+---------------+------------------+
2022-08-04 00:50:56 [INFO] | Info Type | Baseline | Tune 1 result | Best tune result |
2022-08-04 00:50:56 [INFO] +--------------------+----------+---------------+------------------+
2022-08-04 00:50:56 [INFO] | Accuracy | 0.0900 | 0.0900 | 0.0900 |
2022-08-04 00:50:56 [INFO] | Duration (seconds) | 0.4852 | 0.6534 | 0.6534 |
2022-08-04 00:50:56 [INFO] +--------------------+----------+---------------+------------------+
2022-08-04 00:50:56 [INFO] Save tuning history to /home/projects/BigDL/nc_workspace/2022-08-04_00-50-40/./history.snapshot.
2022-08-04 00:50:56 [INFO] Specified timeout or max trials is reached! Found a quantized model which meet accuracy goal. Exit.
2022-08-04 00:50:56 [INFO] Save deploy yaml to /home/projects/BigDL/nc_workspace/2022-08-04_00-50-40/deploy.yaml
####################################################################################################
Accuracy: 0.09000000357627869
The tune result is inconsistent with actual when set
metric=tf.keras.metrics.SparseCategoricalAccuracy()
code:output:
Works well using other accuracy metrics (e.g.
CategoricalCrossentropy
)CategoricalCrossentropy:
output: