mlcommons / tiny

MLPerf™ Tiny is an ML benchmark suite for extremely low-power systems such as microcontrollers
https://mlcommons.org/en/groups/inference-tiny/
Apache License 2.0
331 stars 81 forks source link

Totally off accuracies for anomaly detection with quantized I/O model #140

Closed i3abghany closed 4 months ago

i3abghany commented 12 months ago

Hello,

I am trying to run inference for the Anomaly Detection benchmark against the model with weights, activations, inputs, and outputs quantized. I am getting totally off results for the average AUC.

I changed nothing but the input handling before inference as the data have to be scaled down and converted to np.int8 (just like other benchmarks). Here's the code for that:

def run_inference(model_path, data):
    interpreter = tf.lite.Interpreter(model_path=model_path)
    interpreter.allocate_tensors()

    input_details = interpreter.get_input_details()
    input_scale, zero_point = input_details[0]['quantization']
    input_data = numpy.array(data/input_scale + zero_point, dtype=numpy.int8)            # Just like other benchmarks

    output_details = interpreter.get_output_details()
    output_data = numpy.empty_like(data)

    for i in range(input_data.shape[0]):
        interpreter.set_tensor(input_details[0]['index'], input_data[i:i+1, :])
        interpreter.invoke()
        output_data[i:i+1, :] = interpreter.get_tensor(output_details[0]['index'])

    return output_data

The data parameter comes from the untouched inference code in 03_tflite_test.py and model_path is trained_models/model_ToyCar_quant_fullint_micro_intio.tflite.

The average AUC is 0.5564.

The same exact code (without re-scaling the input data type) works for the trained_models/model_ToyCar_quant_fullint_micro.tflite model.

I tried to scale the input representative dataset using the following code in the conversion script:

def representative_dataset_gen():
    for sample in train_data[::5]:
        sample = numpy.expand_dims(sample.astype(numpy.float32), axis=0)
        sample = sample / numpy.max(numpy.abs(sample), axis=0)
        yield [sample]

However, this makes the average AUC even worse: 0.4605.

Any hints would be appreciated, Thanks

i3abghany commented 12 months ago

PS: I am aware of this issue: https://github.com/mlcommons/tiny/issues/110. The author seems to have had a similar problem, but there is no solution on the issue page.

nemcekova commented 11 months ago

Hello,

I managed to get results with int i/o models, similar to float i/o models.

How I did it:

scale, zp = output_details[0]['quantization']
out = output_data.astype(numpy.float32)
out = scale * (out - zp)

I have average AUC 0.8408.

Hope this helps. :)

cskiraly commented 5 months ago

@i3abghany did you manage to solve the issue based on the suggestions above? If so I would close the issue.