open-mmlab / mmsegmentation

OpenMMLab Semantic Segmentation Toolbox and Benchmark.
https://mmsegmentation.readthedocs.io/en/main/
Apache License 2.0
8.36k stars 2.63k forks source link

Fast-SCNN Input Size #1043

Closed skaldesh closed 3 years ago

skaldesh commented 3 years ago

Hi, I have a general question: I want to fine-tune the Fast-SCNN model trained in this repo.

Before I started my own training, I wanted to try if the ONNX export works with the pretrained model, but there I already ran into an issue when exporting to ONNX. I am using the config and weights from this repo: https://github.com/open-mmlab/mmsegmentation/tree/master/configs/fastscnn

Command:

python tools/pytorch2onnx.py \
    fast_scnn.py \
    --checkpoint pretrained-fast-scnn.pth \
    --output-file model.onnx \
    --input-img test.jpg \
    --verify \
    --shape 512 1024

Error:

Use load_from_local loader
Successfully exported ONNX model: data/output/model
/opt/conda/lib/python3.7/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:353: UserWarning: Deprecation warning. This ORT build has ['CUDAExecutionProvider', 'CPUExecutionProvider'] enabled. The next release (ORT 1.10) will require explicitly setting the providers parameter (as opposed to the current behavior of providers getting set/registered by default based on the build flags) when instantiating InferenceSession.For example, onnxruntime.InferenceSession(..., providers=["CUDAExecutionProvider"], ...)
  "based on the build flags) when instantiating InferenceSession."
Traceback (most recent call last):
  File "tools/pytorch2onnx.py", line 397, in <module>
    dynamic_export=args.dynamic_export)
  File "tools/pytorch2onnx.py", line 282, in pytorch2onnx
    err_msg='The outputs are different between Pytorch and ONNX')
  File "/opt/conda/lib/python3.7/site-packages/numpy/testing/_private/utils.py", line 1529, in assert_allclose
    verbose=verbose, header=header, equal_nan=equal_nan)
  File "/opt/conda/lib/python3.7/site-packages/numpy/testing/_private/utils.py", line 761, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=1e-05, atol=1e-05
The outputs are different between Pytorch and ONNX
(shapes (1, 896, 1440), (1, 512, 1024) mismatch)
 x: array([[[0.684211, 0.684211, 0.684211, ..., 0.684211, 0.684211,
         0.684211],
        [0.684211, 0.684211, 0.684211, ..., 0.684211, 0.684211,...
 y: array([[[0.684211, 0.684211, 0.684211, ..., 0.684211, 0.684211,
         0.684211],
        [0.684211, 0.684211, 0.684211, ..., 0.684211, 0.684211,...

The line (shapes (1, 896, 1440), (1, 512, 1024) mismatch) surprised me. When changing --shape 512 1024 to --shape 896 1440 in the export command, the error goes away (but the export fails for a different reason, but lets stick to one topic in this issue).

My final question is: Where is the input size defined for Fast-SCNN? In the README, it said it was trained for 512x1024 crop size, but what does that mean, if not the input size?

Thanks in advance

MengzhangLI commented 3 years ago

Hi, sorry for late reply.

(1) Fine-tune of FastSCNN in MMSegmentation is not easy, based on my previous experiments, my results are lower than markdown result. Probably caused by trained from scratch or some potential bugs in the shadow. But we do not check this phenomenon specifically. I will record it into our memo.

(2) What's the version of PyTorch? Currently deployment of MMSegmentation is experimental and default version is 1.8.

(3) Due to split of various config files such as dataset, schedule and models in [_base_], the input (i.e., crop size) is defined in datasets and from config file of FastSCNN, its dataset is cityscapes:

https://github.com/open-mmlab/mmsegmentation/blob/97f9670c5a4a2a3b4cfb411bcc26db16b23745f7/configs/fastscnn/fast_scnn_lr0.12_8x4_160k_cityscapes.py#L2

Then the crop size is given here: https://github.com/open-mmlab/mmsegmentation/blob/97f9670c5a4a2a3b4cfb411bcc26db16b23745f7/configs/_base_/datasets/cityscapes.py#L6

Best,

skaldesh commented 3 years ago

Hi, sorry for late reply.

I do not consider this late at all, no worries :smile:

(1) Fine-tune of FastSCNN in MMSegmentation is not easy, based on my previous experiments, my results are lower than markdown result. Probably caused by trained from scratch or some potential bugs in the shadow. But we do not check this phenomenon specifically. I will record it into our memo.

Actually, our fine-tuning went very well (we have a simple task with just two classes).

(2) What's the version of PyTorch? Currently deployment of MMSegmentation is experimental and default version is 1.8.

I tried it with 1.6.0 and 1.8.1.

(3) Due to split of various config files such as dataset, schedule and models in [base], the input (i.e., crop size) is defined in datasets and from config file of FastSCNN, its dataset is cityscapes: Then the crop size is given here: crop_size = (512, 1024)

Exactly, crop size is (512, 1024). So why does the ONNX export throw an error saying the input size given via --shape 512 1024 mismatches the model's input size of 896 1440?
Where does it come from? Because the error goes away, if I specify --shape 896 1440 during ONNX export

MengzhangLI commented 3 years ago

OK, got it. Is the size 896 1440 your figure size?

skaldesh commented 3 years ago

No, I am using simply the config and weights that you guys provided. So there are no changes to the config or model weights.
I simply perform the export. So your pretrained model seems to have an input size of 896 1440 and I can not figure out why that is the case :)

skaldesh commented 3 years ago

@MengzhangLI Any update on this?

MengzhangLI commented 3 years ago

@RunningLeon Hi, sorry to bother you. Could you have a look on this issue? Thank you!

RunningLeon commented 3 years ago

No, I am using simply the config and weights that you guys provided. So there are no changes to the config or model weights. I simply perform the export. So your pretrained model seems to have an input size of 896 1440 and I can not figure out why that is the case :)

@skaldesh Hi, it works good on my machine. If you did not change anything and use our official config and checkpoint, it should be fine. BTW, keep in mind that:

ran scripts

--checkpoint checkpoints/fast_scnn_lr0.12_8x4_160k_cityscapes_20210630_164853-0cec9937.pth
--show
--verify
--output-file checkpoints/fast_scnn.onnx
--shape 512 1024
--input-img demo/demo.png

shown result

fastscnn

skaldesh commented 3 years ago

Sorry, I know what my mistake was. The image I was using had dimensions 1440x896 :facepalm: . I assumed that the convert script would simply resize the image using the given shape. But I was wrong, sorry for causing you any inconvenience