openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
https://docs.openvino.ai
Apache License 2.0
7.24k stars 2.26k forks source link

classification_f1-scores metric requires dataset metadataPlease provide dataset meta file or regenerate annotation #14323

Closed glucasol closed 1 year ago

glucasol commented 1 year ago

I am trying to run accuracy_check command but it returns me this error : raise ConfigError('classification_f1-scores metric requires dataset metadata' openvino.tools.accuracy_checker.config.config_validator.ConfigError: classification_f1-scores metric requires dataset metadataPlease provide dataset meta file or regenerate annotation

I am working with an Anomaly Detection model an have generated the annotation.pickle file using convert_annotation command with common_semantic_segmentation converter specifying images and ground truth masks files. Is this the right way of generation the annotation file?

My configuration.yaml file is:

models:

zulkifli-halim commented 1 year ago

Hi @glucasol, Are you using this anomalib repo? Please make sure that the path to your generated annotation pickle is correct.

glucasol commented 1 year ago

Hi @glucasol, Are you using this anomalib repo? Please make sure that the path to your generated annotation pickle is correct.

Hi @zulkifli-halim, thanks for your answer. Yes, I am using anomalib's repo. Do you know if anomalib generate a annotation pickle file? If yes, where is the default path it is saved?

zulkifli-halim commented 1 year ago

@glucasol, As I see in the code from anomalib, no pickle file is generated throughout the entire process.

glucasol commented 1 year ago

@zulkifli-halim, In this case I have to generate it using convert_annotation command. I have done that specifying images path and ground truth masks path and generated .pickle. The error says I need to specify a dataset metadata file, witch contains a label_map dictionary in format 'class_id: class_name', and it seems anomalib do not generate this kind of file too.

zulkifli-halim commented 1 year ago

Hi @glucasol, Can you share your work with me? I'd like to bring this up on my end and investigate this issue further. You can send it to my email: zulkiflix.bin.abdul.halim@intel.com

glucasol commented 1 year ago

@zulkifli-halim I have sent you an email explaining my problems with the title Openvino & Anomalib Issues. Thanks for your time.

glucasol commented 1 year ago

I am reopening this issue to clarify the problem I am facing.

I just built a segmentation model to detect anomalies in MVTec dataset (the default example in Anomalib's repo) and I am trying to optimize it because it will be very useful in some projects we are developing.

I have generated the IR model and tried to optimize the model using POT (Post Training Optimization Toolkit) in Default Quantization mode and following the Best Practices suggestion, but instead of decrease inference time it is increasing it (I found it curious...). I just followed these steps here: https://docs.openvino.ai/latest/pot_default_quantization_usage.html#doxid-pot-default-quantization-usage. Can you tell me if I did something wrong?

I also am trying to optimize in Accuracy Aware Quantization mode but I need an annotated dataset. Anomalib does not generate this annotation file, so I need to do this externally. Can you tell me if I follow this steps: https://docs.openvino.ai/latest/pot_accuracyaware_usage.html creating the Dataloader is the best way to generate this annotation file? Or it is better to create it via convert_annotation command specifying the appropriate converter (witch I believe to be common_semantic_segmentation).

zulkifli-halim commented 1 year ago

@glucasol, please share your model and the command you used when optimizing the model using POT. For the dataset format, you can refer to Data Types .

ashwinvaidya17 commented 1 year ago

@hbalasu1 we are aware of this use case and Anomalib does not officially support POT. An initial suggestion would be to write a custom data loader but I'll leave the discussion about the best way to create annotation to the POT team as I haven't used the convert_annotation command.

hbalasu1 commented 1 year ago

Hi @glucasol

As @ashwinvaidya17 mentioned, Anomalib does not officially support POT, that the reason you could experience decreasing inference performance.

glucasol commented 1 year ago

@zulkifli-halim Unfortunately I can't attach the model.bin file because it is greater than 25MB. But I just pip installed anomalib's library and it's dependencies and ran python tools/train.py file, following https://github.com/openvinotoolkit/anomalib instructions.

For POT in Default Quantization mode I used the pipeline described in the documentation. The code is shown below: `

import os

import numpy as np

import cv2 as cv

from openvino.tools.pot import DataLoader

from openvino.tools.pot import IEEngine

from openvino.tools.pot import load_model, save_model

from openvino.tools.pot import compress_model_weights

from openvino.tools.pot import create_pipeline

class ImageLoader(DataLoader):
    """ Loads images from a folder """
    def __init__(self, dataset_path):
        # Use OpenCV to gather image files
        # Collect names of image files
        self._files = []
        all_files_in_dir = os.listdir(dataset_path)
        for name in all_files_in_dir:
            file = os.path.join(dataset_path, name)
            if cv.haveImageReader(file):
                self._files.append(file)

        # Define shape of the model
        self._shape = (256, 256)

    def __len__(self):
        """ Returns the length of the dataset """
        return len(self._files)

    def __getitem__(self, index):
        """ Returns image data by index in the NCHW layout
        Note: model-specific preprocessing is omitted, consider adding it here
        """
        if index >= len(self):
            raise IndexError("Index out of dataset size")

        image = cv.imread(self._files[index]) # read image with OpenCV
        image = cv.resize(image, self._shape) # resize to a target input size
        image = np.expand_dims(image, 0)  # add batch dimension
        image = image.transpose(0, 3, 1, 2)  # convert to NCHW layout
        return image, None   # annotation is set to None

# Model config specifies the name of the model and paths to .xml and .bin files of the model.
model_config = {
    "model_name": "model",
    "model": './openvino/model.xml',
    "weights": './openvino/model.bin',
}

# Engine config.
engine_config = {"device": "CPU"}

algorithms = [
    {
        "name": "DefaultQuantization",
        "params": {
            "preset": "performance",
            "target_device": "ANY",
            "stat_subset_size": 300,
            "stat_batch_size": 1
        },
    }
]

# Step 1: Implement and create a user data loader.
data_loader = ImageLoader("./datasets/MVTec/bottle/test/broken_small")

# Step 2: Load a model.
model = load_model(model_config=model_config)

# Step 3: Initialize the engine for metric calculation and statistics collection.
engine = IEEngine(config=engine_config, data_loader=data_loader)

# Step 4: Create a pipeline of compression algorithms and run it.
pipeline = create_pipeline(algorithms, engine)
compressed_model = pipeline.run(model=model)

# Step 5 (Optional): Compress model weights to quantized precision
#                     to reduce the size of the final .bin file.
compress_model_weights(compressed_model)

# Step 6: Save the compressed model to the desired path.
# Set save_path to the directory where the model should be saved.
compressed_model_paths = save_model(
    model=compressed_model,
    save_path="optimized_model_mvtec",
    model_name="optimized_model_mvtec",
)`

I ran benchmark_app command in terminal in latency mode (-hint latency) for both models and the results were: Before Optimization:

`

[Step 1/11] Parsing and validating input arguments
[ WARNING ]  -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README. 
[Step 2/11] Loading OpenVINO
[ INFO ] OpenVINO:
         API version............. 2022.2.0-7713-af16ea1d79a-releases/2022/2
[ INFO ] Device info
         CPU
         openvino_intel_cpu_plugin version 2022.2
         Build................... 2022.2.0-7713-af16ea1d79a-releases/2022/2

[Step 3/11] Setting device configuration
[Step 4/11] Reading network files
[ INFO ] Read model took 138.33 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model input 'input' precision u8, dimensions ([N,C,H,W]): 1 3 256 256
[ INFO ] Model output 'output' precision f32, dimensions ([...]): 1 1 256 256
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 312.57 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] DEVICE: CPU
[ INFO ]   AVAILABLE_DEVICES  , ['']
[ INFO ]   RANGE_FOR_ASYNC_INFER_REQUESTS  , (1, 1, 1)
[ INFO ]   RANGE_FOR_STREAMS  , (1, 72)
[ INFO ]   FULL_DEVICE_NAME  , Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz
[ INFO ]   OPTIMIZATION_CAPABILITIES  , ['WINOGRAD', 'FP32', 'FP16', 'INT8', 'BIN', 'EXPORT_IMPORT']
[ INFO ]   CACHE_DIR  , 
[ INFO ]   NUM_STREAMS  , 1
[ INFO ]   AFFINITY  , Affinity.CORE
[ INFO ]   INFERENCE_NUM_THREADS  , 0
[ INFO ]   PERF_COUNT  , False
[ INFO ]   INFERENCE_PRECISION_HINT  , <Type: 'float32'>
[ INFO ]   PERFORMANCE_HINT  , PerformanceMode.LATENCY
[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS  , 0
[Step 9/11] Creating infer requests and preparing input data
[ INFO ] Create 2 infer requests took 1.42 ms
[ INFO ] Prepare image ./002.png
[ WARNING ] Image is resized from ((900, 900)) to ((256, 256))
[Step 10/11] Measuring performance (Start inference asynchronously, 2 inference requests, inference only: True, limits: 60000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 13.13 ms
[Step 11/11] Dumping statistics report
Count:          10690 iterations
Duration:       60009.61 ms
Latency:
    Median:     11.24 ms
    AVG:        11.16 ms
    MIN:        7.71 ms
    MAX:        32.83 ms
Throughput: 178.14 FPS

`

After the Optimization `

  [Step 1/11] Parsing and validating input arguments
  [ WARNING ]  -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README. 
  [Step 2/11] Loading OpenVINO
  [ INFO ] OpenVINO:
           API version............. 2022.2.0-7713-af16ea1d79a-releases/2022/2
  [ INFO ] Device info
           CPU
           openvino_intel_cpu_plugin version 2022.2
           Build................... 2022.2.0-7713-af16ea1d79a-releases/2022/2

  [Step 3/11] Setting device configuration
  [Step 4/11] Reading network files
  [ INFO ] Read model took 59.19 ms
  [Step 5/11] Resizing network to match image sizes and given batch
  [ INFO ] Network batch size: 1
  [Step 6/11] Configuring input of the model
  [ INFO ] Model input 'input' precision u8, dimensions ([N,C,H,W]): 1 3 256 256
  [ INFO ] Model output 'output' precision f32, dimensions ([...]): 1 1 256 256
  [Step 7/11] Loading the model to the device
  [ INFO ] Compile model took 187.87 ms
  [Step 8/11] Querying optimal runtime parameters
  [ INFO ] DEVICE: CPU
  [ INFO ]   AVAILABLE_DEVICES  , ['']
  [ INFO ]   RANGE_FOR_ASYNC_INFER_REQUESTS  , (1, 1, 1)
  [ INFO ]   RANGE_FOR_STREAMS  , (1, 72)
  [ INFO ]   FULL_DEVICE_NAME  , Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz
  [ INFO ]   OPTIMIZATION_CAPABILITIES  , ['WINOGRAD', 'FP32', 'FP16', 'INT8', 'BIN', 'EXPORT_IMPORT']
  [ INFO ]   CACHE_DIR  , 
  [ INFO ]   NUM_STREAMS  , 1
  [ INFO ]   AFFINITY  , Affinity.CORE
  [ INFO ]   INFERENCE_NUM_THREADS  , 0
  [ INFO ]   PERF_COUNT  , False
  [ INFO ]   INFERENCE_PRECISION_HINT  , <Type: 'float32'>
  [ INFO ]   PERFORMANCE_HINT  , PerformanceMode.LATENCY
  [ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS  , 0
  [Step 9/11] Creating infer requests and preparing input data
  [ INFO ] Create 2 infer requests took 2.10 ms
  [ INFO ] Prepare image ./002.png
  [ WARNING ] Image is resized from ((900, 900)) to ((256, 256))
  [Step 10/11] Measuring performance (Start inference asynchronously, 2 inference requests, inference only: True, limits: 60000 ms duration)
  [ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
  [ INFO ] First inference took 27.84 ms
  [Step 11/11] Dumping statistics report
  Count:          10066 iterations
  Duration:       60014.19 ms
  Latency:
      Median:     11.80 ms
      AVG:        11.87 ms
      MIN:        8.29 ms
      MAX:        51.72 ms
  Throughput: 167.73 FPS

`

glucasol commented 1 year ago

Hi @hbalasu1, Understand, but you know the best way to create annotation to an anomalib model? If common_semantic_segmentation is the appropriate converter and so on.

@ashwinvaidya17 Do you know if Anomalib will have official support to POT?

ashwinvaidya17 commented 1 year ago

@glucasol At this moment we don't plan to add it due to other priorities. But we could investigate the possibility of adding it in case more users want it.

zulkifli-halim commented 1 year ago

@glucasol. For an example of dataset preparation, you may refer here.

zulkifli-halim commented 1 year ago

Closing issue. Feel free to reopen in further assistance is needed.