pytorch / serve

Serve, optimize and scale PyTorch models in production
https://pytorch.org/serve/
Apache License 2.0
4.23k stars 864 forks source link

torchserve version mismatching at `pytorch/torchserve:0.5.3-gpu`, `pytorch/torchserve:0.5.3-cpu` #1736

Closed klae01 closed 2 years ago

klae01 commented 2 years ago

pip list in pytorch/torchserve:0.5.3-gpu return following. I found torchserve version is 0.6.0. Also pytorch/torchserve:0.5.3-cpu also have same issue

pytorch/torchserve:0.5.3-gpu

Package              Version
-------------------- ------------
captum               0.5.0
certifi              2022.5.18.1
charset-normalizer   2.0.12
cycler               0.11.0
enum-compat          0.0.3
fonttools            4.33.3
future               0.18.2
idna                 3.3
kiwisolver           1.4.2
matplotlib           3.5.2
numpy                1.22.4
packaging            21.3
Pillow               9.1.1
pip                  22.1.2
pkg_resources        0.0.0
psutil               5.9.1
pyparsing            3.0.9
python-dateutil      2.8.2
requests             2.28.0
setuptools           62.3.3
six                  1.16.0
torch                1.11.0+cu102
torch-model-archiver 0.6.0
torchserve           0.6.0
torchtext            0.12.0
torchvision          0.12.0+cu102
tqdm                 4.64.0
typing_extensions    4.2.0
urllib3              1.26.9
wheel                0.37.1

pytorch/torchserve:0.5.3-cpu

Package              Version
-------------------- -----------
captum               0.5.0
certifi              2022.5.18.1
charset-normalizer   2.0.12
cycler               0.11.0
enum-compat          0.0.3
fonttools            4.33.3
future               0.18.2
idna                 3.3
kiwisolver           1.4.2
matplotlib           3.5.2
numpy                1.22.4
packaging            21.3
Pillow               9.1.1
pip                  22.1.2
pkg_resources        0.0.0
psutil               5.9.1
pyparsing            3.0.9
python-dateutil      2.8.2
requests             2.28.0
setuptools           62.3.3
six                  1.16.0
torch                1.11.0+cpu
torch-model-archiver 0.6.0
torchserve           0.6.0
torchtext            0.12.0
torchvision          0.12.0+cpu
tqdm                 4.64.0
typing_extensions    4.2.0
urllib3              1.26.9
wheel                0.37.1
msaroufim commented 2 years ago

Ack - I've repro'ed and fixed this issue in a staging environment just need to confirm with rest of the team before I promote

I've uploaded the correct binaries to my own account as a staging environment and did 3 manual tests for each CPU and GPU environment and listed results here https://gist.github.com/msaroufim/8ed161ae98bd34c70cc94e12afe851ca

The problem is if we build our docker images using the regular Dockerfile there's actually this line https://github.com/pytorch/serve/blob/release_0.5.3/docker/Dockerfile#L73 RUN python -m pip install -U setuptools && python -m pip install --no-cache-dir captum torchtext torchserve torch-model-archiver which will install the latest release on pypi and essentially ignore the echo "-b, --branch_name=BRANCH_NAME specify a branch_name to use" parameter in our build_image.sh script since that's also actually never passed in to what we call our production images.

The reason this never happened before is because the latest release version on pypi would match what is expected because we never updated an old image but when I made the hotfix update to our docker images a month ago when a newer release version was available it pulled the latest version 0.6.0 instead of the version that it needs to match 0.5.3. so pip list ends up disagreeing with what's present in serve/ts/version.txt from inside the docker container. Although if we ever tried to promote docker images before promoting pypi we would have promoted bad binaries and it's pretty much a coincidence this hasn't happened so far.

klae01 commented 2 years ago

Why not copy the files to the docker /tmp folder and then run python -m pip install . to replace python -m pip install --no-cache-dir torchserve? In particular, the BRANCH_NAME is not passed to the Dockerfile, so it looks fine to build directly with the code in the repository. https://github.com/klae01/serve-dockerfile-update/tree/feature/pip_install_from_repo

pip list result obtained by changing only 6 lines in release_0.5.3

Package              Version
-------------------- --------------
captum               0.5.0
certifi              2022.6.15
charset-normalizer   2.1.0
cycler               0.11.0
enum-compat          0.0.3
fonttools            4.34.4
future               0.18.2
idna                 3.3
kiwisolver           1.4.4
matplotlib           3.5.2
numpy                1.23.1
packaging            21.3
Pillow               9.2.0
pip                  22.1.2
pkg_resources        0.0.0
psutil               5.9.1
pyparsing            3.0.9
python-dateutil      2.8.2
requests             2.28.1
setuptools           63.2.0
six                  1.16.0
torch                1.12.0+cpu
torch-model-archiver 0.6.0
torchserve           0.5.3b20220718
torchtext            0.13.0
torchvision          0.13.0+cpu
tqdm                 4.64.0
typing_extensions    4.3.0
urllib3              1.26.10
wheel                0.37.1
msaroufim commented 2 years ago

Updated the 0.5.3 images just now