scripts to validate whether the performance of the deployed model matches that of its PyTorch model

Motivation

As a part of the regression test, the performance of a deployed model should be checked if it is consistent with the performance of its PyTorch model. In other words, MMDeploy should test the metrics of a deployed model, and check if it matches the metrics reported by the PyTorch model.

Requirement

python scripts.py <args>

TODO Discuss about proper args

Hi lvhan028, here is my idea. 😄

Prerequisite

Before using this script, the user have to install the backend tools which he/she wants to use. Such as tensorrt, onnx and so on.

Usage

python ./tools/accuracy_test.py \
    --deploy-cfg "${DEPLOY_CFG_PATH}" \
    --model-cfg "${MODEL_CFG_PATH}" \
    --checkpoint "${MODEL_CHECKPOINT_PATH}" \
    --dataset-path "${TEST_DATASET_PATH}" \
    --work-dir "${WORK_DIR}" \
    --calib-dataset-cfg "${CALIB_DATA_CFG}" \
    --device "${DEVICE}" \
    --log-level INFO \
    --show \
    --dump-info

Description of all arguments:

--deploy-cfg: The config for deployment. --model-cfg: The config of the model in OpenMMLab codebases. --checkpoint : The path of model checkpoint file. --dataset-path: The dataset for test pipeline --work-dir : The path of work directory that used to save logs and models. --calib-dataset-cfg : Config used for calibration. If not specified, it will be set to None. --device : The device used for conversion. If not specified, it will be set to cpu. --log-level : To set log level which in 'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'. If not specified, it will be set to INFO. --show : Whether to show detection outputs. --dump-info : Whether to output information for SDK.

Example

python ./tools/accuracy_test.py \
    --deploy-cfg "configs/mmdet/detection/detection_tensorrt_dynamic-320x320-1344x1344.py" \
    --model-cfg "$PATH_TO_MMDET/configs/yolo/yolov3_d53_mstrain-608_273e_coco.py" \
    --checkpoint "$PATH_TO_MMDET/checkpoints/yolo/yolov3_d53_mstrain-608_273e_coco.pth" \
    --dataset-path "${PATH_TO_MMDET}/data/coco/val2017" \
    --work-dir "${WORK_DIR}" \
    --show \
    --device cuda:0

Note

To do so, the script will do the follow step to complete:

Use the model_cfg to load the test_pipeline to test the dataset which will show the baseline of the model, which I name it baseline A.
Script convert the model according to the deploy_cfg, use tensorrt fp16 for example, we can get the A_trt_fp16.engine after this.
Like step 1, using the test_pipeline to run the test, then we can have the baseline B
Show the result of compareson of baseline A and baseline B using the terminal, like

model_cfg	hmean	precision	recall	FPS
yolov3_d53_mstrain-608_273e_coco.py	0.95	0.98	0.88	30
detection_tensorrt_dynamic-320x320-1344x1344.py	0.95	0.97	0.87	300

Hi lvhan028, here is my idea. smile

Prerequisite

Before using this script, the user have to install the backend tools which he/she wants to use. Such as tensorrt, onnx and so on.

Usage
python ./tools/accuracy_test.py \
    --deploy-cfg "${DEPLOY_CFG_PATH}" \
    --model-cfg "${MODEL_CFG_PATH}" \
    --checkpoint "${MODEL_CHECKPOINT_PATH}" \
    --dataset-path "${TEST_DATASET_PATH}" \
    --work-dir "${WORK_DIR}" \
    --calib-dataset-cfg "${CALIB_DATA_CFG}" \
    --device "${DEVICE}" \
    --log-level INFO \
    --show \
    --dump-info
Description of all arguments:

--deploy-cfg: The config for deployment. --model-cfg: The config of the model in OpenMMLab codebases. --checkpoint : The path of model checkpoint file. --dataset-path: The dataset for test pipeline --work-dir : The path of work directory that used to save logs and models. --calib-dataset-cfg : Config used for calibration. If not specified, it will be set to None. --device : The device used for conversion. If not specified, it will be set to cpu. --log-level : To set log level which in 'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'. If not specified, it will be set to INFO. --show : Whether to show detection outputs. --dump-info : Whether to output information for SDK.

Example
python ./tools/accuracy_test.py \
    --deploy-cfg "configs/mmdet/detection/detection_tensorrt_dynamic-320x320-1344x1344.py" \
    --model-cfg "$PATH_TO_MMDET/configs/yolo/yolov3_d53_mstrain-608_273e_coco.py" \
    --checkpoint "$PATH_TO_MMDET/checkpoints/yolo/yolov3_d53_mstrain-608_273e_coco.pth" \
    --dataset-path "${PATH_TO_MMDET}/data/coco/val2017" \
    --work-dir "${WORK_DIR}" \
    --show \
    --device cuda:0
Note

To do so, the script will do the follow step to complete:

Use the model_cfg to load the test_pipeline to test the dataset which will show the baseline of the model, which I name it baseline A.

Script convert the model according to the deploy_cfg, use tensorrt fp16 for example, we can get the A_trt_fp16.engine after this.

Like step 1, using the test_pipeline to run the test, then we can have the baseline B

Show the result of compareson of baseline A and baseline B using the terminal, like

model_cfg hmean precision recall FPS yolov3_d53_mstrain-608_273e_coco.py 0.95 0.98 0.88 30 detection_tensorrt_dynamic-320x320-1344x1344.py 0.95 0.97 0.87 300

I think it's better to do the job on a batch of models and a batch of backends, not just one. Meanwhile, the compute precision such as FP32, FP16 and INT8 should be taken into consideration, too. Regarding the final report, it is necessary to show the matching result (Yes or No) besides the performance number.

Hi lvhan028, here is my idea. smile

Prerequisite

Before using this script, the user have to install the backend tools which he/she wants to use. Such as tensorrt, onnx and so on.

Usage
python ./tools/accuracy_test.py \
    --deploy-cfg "${DEPLOY_CFG_PATH}" \
    --model-cfg "${MODEL_CFG_PATH}" \
    --checkpoint "${MODEL_CHECKPOINT_PATH}" \
    --dataset-path "${TEST_DATASET_PATH}" \
    --work-dir "${WORK_DIR}" \
    --calib-dataset-cfg "${CALIB_DATA_CFG}" \
    --device "${DEVICE}" \
    --log-level INFO \
    --show \
    --dump-info
Description of all arguments:

--deploy-cfg: The config for deployment. --model-cfg: The config of the model in OpenMMLab codebases. --checkpoint : The path of model checkpoint file. --dataset-path: The dataset for test pipeline --work-dir : The path of work directory that used to save logs and models. --calib-dataset-cfg : Config used for calibration. If not specified, it will be set to None. --device : The device used for conversion. If not specified, it will be set to cpu. --log-level : To set log level which in 'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'. If not specified, it will be set to INFO. --show : Whether to show detection outputs. --dump-info : Whether to output information for SDK.

Example
python ./tools/accuracy_test.py \
    --deploy-cfg "configs/mmdet/detection/detection_tensorrt_dynamic-320x320-1344x1344.py" \
    --model-cfg "$PATH_TO_MMDET/configs/yolo/yolov3_d53_mstrain-608_273e_coco.py" \
    --checkpoint "$PATH_TO_MMDET/checkpoints/yolo/yolov3_d53_mstrain-608_273e_coco.pth" \
    --dataset-path "${PATH_TO_MMDET}/data/coco/val2017" \
    --work-dir "${WORK_DIR}" \
    --show \
    --device cuda:0
Note

To do so, the script will do the follow step to complete:

Use the model_cfg to load the test_pipeline to test the dataset which will show the baseline of the model, which I name it baseline A.

Script convert the model according to the deploy_cfg, use tensorrt fp16 for example, we can get the A_trt_fp16.engine after this.

Like step 1, using the test_pipeline to run the test, then we can have the baseline B

Show the result of compareson of baseline A and baseline B using the terminal, like

model_cfg hmean precision recall FPS yolov3_d53_mstrain-608_273e_coco.py 0.95 0.98 0.88 30 detection_tensorrt_dynamic-320x320-1344x1344.py 0.95 0.97 0.87 300
I think it's better to do the job on a batch of models and a batch of backends, not just one. Meanwhile, the compute precision such as FP32, FP16 and INT8 should be taken into consideration, too. Regarding the final report, it is necessary to show the matching result (Yes or No) besides the performance number.

Hi lvhan028, I have some quesents about your reply: 😄

I think it's better to do the job on a batch of models and a batch of backends, not just one.: In my case, i will use pytorch to choose a specific model to train my own dataset, in this step i already know which model is the best for me. After i trained, i am eager to test the precision and other score of the model after convert to deployable backend. I can see the needs of a batch of backends, but I want to know what situation that the user wants to compare a batch of models before he/she deploy?
Meanwhile, the compute precision such as FP32, FP16 and INT8 should be taken into consideration, too: I think this is nessary for user. In my original idea i will test on pytorch -> FP32 -> FP16 -> INT8, the user just need to set the config like : --deploy-cfg xxx_fp16_xxx.py,xxx_fp32_xxx.py,xxx_int8_xxx.py, it is okay?
Regarding the final report, it is necessary to show the matching result (Yes or No) besides the performance number.: The scripy will generate a report, use bold to mark the best result. In my case there are max probability will appear the score wont match, such as pytorch hmean is 0.95 but tensorrt FP16 is 0.96. I want to know the (Yes or No) means equal or higher is Yes?

Hi lvhan028, here is my idea. smile

Prerequisite

Before using this script, the user have to install the backend tools which he/she wants to use. Such as tensorrt, onnx and so on.

Usage
python ./tools/accuracy_test.py \
    --deploy-cfg "${DEPLOY_CFG_PATH}" \
    --model-cfg "${MODEL_CFG_PATH}" \
    --checkpoint "${MODEL_CHECKPOINT_PATH}" \
    --dataset-path "${TEST_DATASET_PATH}" \
    --work-dir "${WORK_DIR}" \
    --calib-dataset-cfg "${CALIB_DATA_CFG}" \
    --device "${DEVICE}" \
    --log-level INFO \
    --show \
    --dump-info
Description of all arguments:

--deploy-cfg: The config for deployment. --model-cfg: The config of the model in OpenMMLab codebases. --checkpoint : The path of model checkpoint file. --dataset-path: The dataset for test pipeline --work-dir : The path of work directory that used to save logs and models. --calib-dataset-cfg : Config used for calibration. If not specified, it will be set to None. --device : The device used for conversion. If not specified, it will be set to cpu. --log-level : To set log level which in 'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'. If not specified, it will be set to INFO. --show : Whether to show detection outputs. --dump-info : Whether to output information for SDK.

Example
python ./tools/accuracy_test.py \
    --deploy-cfg "configs/mmdet/detection/detection_tensorrt_dynamic-320x320-1344x1344.py" \
    --model-cfg "$PATH_TO_MMDET/configs/yolo/yolov3_d53_mstrain-608_273e_coco.py" \
    --checkpoint "$PATH_TO_MMDET/checkpoints/yolo/yolov3_d53_mstrain-608_273e_coco.pth" \
    --dataset-path "${PATH_TO_MMDET}/data/coco/val2017" \
    --work-dir "${WORK_DIR}" \
    --show \
    --device cuda:0
Note

To do so, the script will do the follow step to complete:

Use the model_cfg to load the test_pipeline to test the dataset which will show the baseline of the model, which I name it baseline A.

Script convert the model according to the deploy_cfg, use tensorrt fp16 for example, we can get the A_trt_fp16.engine after this.

Like step 1, using the test_pipeline to run the test, then we can have the baseline B

Show the result of compareson of baseline A and baseline B using the terminal, like

model_cfg hmean precision recall FPS yolov3_d53_mstrain-608_273e_coco.py 0.95 0.98 0.88 30 detection_tensorrt_dynamic-320x320-1344x1344.py 0.95 0.97 0.87 300
I think it's better to do the job on a batch of models and a batch of backends, not just one. Meanwhile, the compute precision such as FP32, FP16 and INT8 should be taken into consideration, too. Regarding the final report, it is necessary to show the matching result (Yes or No) besides the performance number.
Hi lvhan028, I have some quesents about your reply: smile

I think it's better to do the job on a batch of models and a batch of backends, not just one.: In my case, i will use pytorch to choose a specific model to train my own dataset, in this step i already know which model is the best for me. After i trained, i am eager to test the precision and other score of the model after convert to deployable backend. I can see the needs of a batch of backends, but I want to know what situation that the user wants to compare a batch of models before he/she deploy?

Meanwhile, the compute precision such as FP32, FP16 and INT8 should be taken into consideration, too: I think this is nessary for user. In my original idea i will test on pytorch -> FP32 -> FP16 -> INT8, the user just need to set the config like : --deploy-cfg xxx_fp16_xxx.py,xxx_fp32_xxx.py,xxx_int8_xxx.py, it is okay?

Regarding the final report, it is necessary to show the matching result (Yes or No) besides the performance number.: The scripy will generate a report, use bold to mark the best result. In my case there are max probability will appear the score wont match, such as pytorch hmean is 0.95 but tensorrt FP16 is 0.96. I want to know the (Yes or No) means equal or higher is Yes?

This script is not just for users. It is for MMDeploy's regression test. That is to say, every time MMDeploy releases a new version, it has to do a full test. Test whether all supported models can be deployed to all supported backends correctly. How to measure the correctness? Comparing the training model and the deployed model metrics.

...

This script is not just for users. It is for MMDeploy's regression test. That is to say, every time MMDeploy releases a new version, it has to do a full test. Test whether all supported models can be deployed to all supported backends correctly. How to measure the correctness? Comparing the training model and the deployed model metrics.

OK, I know what to do. 😆

Prerequisite

Before using this script, the user have to install the backend tools which he/she wants to use. Such as tensorrt, onnx and so on. If it is MMDeploy's regression test, it will require to install all the backend.

Usage

python ./tools/accuracy_test.py \
    --deploy-cfg "${DEPLOY_CFG_PATH}" \
    --model-cfg "${MODEL_CFG_PATH}" \
    --checkpoint "${MODEL_CHECKPOINT_PATH}" \
    --dataset-path "${TEST_DATASET_PATH}" \
    --work-dir "${WORK_DIR}" \
    --calib-dataset-cfg "${CALIB_DATA_CFG}" \
    --device "${DEVICE}" \
    --log-level INFO \
    --show \
    --dump-info

Description of all arguments:

--deploy-cfg: The config for deployment. It can be directory or file.
- File: it will only test on the specific file.
- Directory: it will test all the file below in the direcory. Meanwhile, it will ignore the directory which named _base_.
--model-cfg: The config of the model in OpenMMLab codebases. It also can be .py or .yaml.
- .py: it will only test on specific codebases.
- .yaml: it will test all models according to the yaml.
--checkpoint : The path of model checkpoint file. if --model-cfg set to a direcory, it wont be used in the pipeline.
--dataset-path: The dataset for test pipeline. if --model-cfg set to a direcory, it wont be used in the pipeline.
--work-dir : The path of work directory that used to save logs and models. It will generate sub directory automatically when --model-cfg set to a direcory.
--calib-dataset-cfg : Config used for calibration. If not specified, it will be set to None. if --model-cfg set to a direcory, it wont be used in the pipeline.
--device : The device used for conversion. If not specified, it will be set to cpu.
--log-level : To set log level which in 'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'. If not specified, it will be set to INFO.
--show : Whether to show detection outputs.
--dump-info : Whether to output information for SDK.

model-cfg.yaml example

mmdet:
    code_dir: ${PATH_TO_MMDET}
    checkpoint_dir: ${PATH_TO_MMDET_CHECKPOINT_DIR}
    dataset_dir:  # TODO: if it is nessary to set it under each models ?
        - coco: ${PATH_TO_COCO_DATASET_DIR}
        - xxx
    calib-dataset-cfg:  # TODO: if it is nessary to set it under each models ?
        - coco: ${PATH_TO_COCO_CALIB_DATASET_CFG}
        - xxx
    models:
        - hrnet 
        - yolo
        - yolox
        - yolof
        - ...

mmcls:
      (same as mmdet)

mmpose:
      (same as mmdet)

Regression Test

Example

python ./tools/accuracy_test.py \
    --deploy-cfg "./configs/mmdet" \  # if set to "./configs" then will test all
    --model-cfg "./test/model-cfg.yaml" \
    --work-dir "${WORK_DIR}" \
    --show \
    --device cuda:0

Step

To do so, the script will do the follow step to complete:

Load the files below in the direcory deploy-cfg. Meanwhile, it will ignore the directory which named _base_.
According to the direcory deploy-cfg name and read model-cfg.yaml to get the models which need to be tested, ergodic the models to load the test_pipeline to test the dataset which will show the baseline of the pytorch model.
Use one by one file to convert the model below direcory deploy-cfg, and then like step 1, using the test_pipeline to run the test.
Show the result of compareson of all models and backends using the terminal, like below , meanwhile it will save in a excel file

model_type	model_name	model_cfg	deploy_cfg	hmean	precision	recall	FPS	test pass
mmdet	yolov3_d53_mstrain-608_273e_coco.pth	yolov3_d53_mstrain-608_273e_coco.py	-	0.95	0.98	0.88	30	-
mmedt	yolov3_d53_mstrain-608_273e_coco.pth	yolov3_d53_mstrain-608_273e_coco.py	detection_tensorrt_dynamic-320x320-1344x1344.py	0.95	0.97	0.87	300	√
...	xxx.pth	xxx.py	-	xxx	xxx	xxx	xxx	-

User Test

Example

python ./tools/accuracy_test.py \
    --deploy-cfg "configs/mmdet/detection/detection_tensorrt_dynamic-320x320-1344x1344.py" \
    --model-cfg "$PATH_TO_MMDET/configs/yolo/yolov3_d53_mstrain-608_273e_coco.py" \
    --checkpoint "$PATH_TO_MMDET/checkpoints/yolo/yolov3_d53_mstrain-608_273e_coco.pth" \
    --dataset-path "${PATH_TO_MMDET}/data/coco/val2017" \
    --work-dir "${WORK_DIR}" \
    --show \
    --device cuda:0

Step

To do so, the script will do the follow step to complete:

Use the model_cfg to load the test_pipeline to test the dataset which will show the baseline of the model, which I name it baseline A.
Script convert the model according to the deploy_cfg, use tensorrt fp16 for example, we can get the A_trt_fp16.engine after this.
Like step 1, using the test_pipeline to run the test, then we can have the baseline B
Show the result of compareson of baseline A and baseline B using the terminal, like

model_cfg	hmean	precision	recall	FPS
yolov3_d53_mstrain-608_273e_coco.py	0.95	0.98	0.88	30
detection_tensorrt_dynamic-320x320-1344x1344.py	0.95	0.97	0.87	300

open-mmlab / mmdeploy