pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
83.98k stars 22.64k forks source link

drastic speed regression of torch.jit.load starting with the 20230301 nightly #95789

Open pmeier opened 1 year ago

pmeier commented 1 year ago

Reproduction:

import torch
from torchvision.models.detection import keypointrcnn_resnet50_fpn
from time import perf_counter

model = keypointrcnn_resnet50_fpn()
scripted_model = torch.jit.script(model)
scripted_model.save("script.pt")

start = perf_counter()
torch.jit.load("script.pt")
stop = perf_counter()
print(torch.__version__, stop - start)
2.0.0.dev20230228+cpu 0.2134191999975883
2.0.0.dev20230301+cpu 21.595253191000666

Results above come from the Python 3.8 wheel on Linux.

This is the root cause for the CI timeouts seen in torchvision: pytorch/vision#7369

Here are some test durations that use the 20230227 nightly: https://github.com/pytorch/vision/actions/runs/4286797892/jobs/7466847813#step:10:3241

============================= slowest 20 durations =============================
13.22s call     test/test_models.py::test_classification_model[cpu-regnet_y_128gf]
12.03s call     test/test_models.py::test_classification_model[cpu-vit_h_14]
9.59s call     test/test_models.py::test_detection_model[cpu-maskrcnn_resnet50_fpn_v2]
9.25s call     test/test_models.py::test_quantized_classification_model[resnext101_32x8d]
9.01s call     test/test_models.py::test_quantized_classification_model[resnext101_64x4d]
8.43s call     test/test_backbone_utils.py::TestFxFeatureExtraction::test_jit_forward_backward[regnet_y_128gf]
8.42s call     test/test_backbone_utils.py::TestFxFeatureExtraction::test_jit_forward_backward[vit_h_14]
8.20s call     test/test_models.py::test_quantized_classification_model[mobilenet_v3_large]
7.84s call     test/test_datasets_video_utils.py::TestVideo::test_video_clips_custom_fps
7.77s call     test/test_models.py::test_classification_model[cpu-efficientnet_v2_l]
7.50s call     test/test_models.py::test_classification_model[cpu-vit_l_16]
7.38s call     test/test_backbone_utils.py::TestFxFeatureExtraction::test_build_fx_feature_extractor[regnet_y_128gf]
7.17s call     test/test_models.py::test_classification_model[cpu-densenet201]
7.12s call     test/test_backbone_utils.py::TestFxFeatureExtraction::test_forward_backward[vit_h_14]
6.98s call     test/test_models.py::test_classification_model[cpu-vit_l_32]
6.97s call     test/test_datasets.py::LFWPairsTestCase::test_transforms
6.81s call     test/test_backbone_utils.py::TestFxFeatureExtraction::test_forward_backward[regnet_y_128gf]
6.62s call     test/test_backbone_utils.py::TestFxFeatureExtraction::test_build_fx_feature_extractor[vit_h_14]
6.29s call     test/test_backbone_utils.py::TestFxFeatureExtraction::test_jit_forward_backward[vit_l_32]
5.93s call     test/test_models.py::test_detection_model[cpu-keypointrcnn_resnet50_fpn]

And here is the same output using the 20230301 nightly: https://github.com/pytorch/vision/actions/runs/4304752013/jobs/7506221231#step:10:3239

============================= slowest 20 durations =============================
27.88s call     test/test_models.py::test_detection_model[cpu-keypointrcnn_resnet50_fpn]
27.44s call     test/test_models.py::test_detection_model[cpu-maskrcnn_resnet50_fpn_v2]
25.18s call     test/test_models.py::test_classification_model[cpu-vit_h_14]
22.61s call     test/test_models.py::test_detection_model[cpu-maskrcnn_resnet50_fpn]
22.20s call     test/test_models.py::test_detection_model[cpu-fasterrcnn_mobilenet_v3_large_fpn]
21.95s call     test/test_models.py::test_classification_model[cpu-densenet201]
21.23s call     test/test_models.py::test_detection_model[cpu-fasterrcnn_mobilenet_v3_large_320_fpn]
19.93s call     test/test_models.py::test_detection_model[cpu-fasterrcnn_resnet50_fpn]
19.90s call     test/test_models.py::test_classification_model[cpu-vit_l_16]
19.89s call     test/test_models.py::test_detection_model[cpu-fasterrcnn_resnet50_fpn_v2]
19.50s call     test/test_models.py::test_classification_model[cpu-vit_l_32]
18.07s call     test/test_models.py::test_classification_model[cpu-densenet169]
17.57s call     test/test_models.py::test_detection_model[cpu-fcos_resnet50_fpn]
17.04s call     test/test_models.py::test_detection_model[cpu-ssdlite320_mobilenet_v3_large]
16.76s call     test/test_models.py::test_classification_model[cpu-densenet161]
16.69s call     test/test_models.py::test_detection_model[cpu-retinanet_resnet50_fpn_v2]
16.56s call     test/test_models.py::test_detection_model[cpu-retinanet_resnet50_fpn]
16.28s call     test/test_models.py::test_detection_model[cpu-ssd300_vgg16]
15.85s call     test/test_models.py::test_vitc_models[cpu-vitc_b_16]
15.20s call     test/test_models.py::test_classification_model[cpu-vit_b_16]

We are seeing this across Python versions and conda nightlies as well, so I guess this is not an env issue.

cc @EikanWang @jgong5 @wenzhe-nrv @sanchitintel

pmeier commented 1 year ago

Tried the same with the latest RC build (https://download.pytorch.org/whl/test/cpu/torch-2.0.0%2Bcpu-cp38-cp38-linux_x86_64.whl) and here everything seems to be fine:

2.0.0+cpu 0.2122817160015984
ZhaoqiongZ commented 1 year ago

Thanks for the report! This could be reproduced with 2.0.0.dev20230301+cpu, we will look into it!

ZailiWang commented 1 year ago

Since it's good on latest RC, seems no need to dive.