UnboundLocalError: local variable 'model_snapshot_path' referenced before assignment

🐛 Describe the bug

The changes made in this commit is causing the following issues.

Error logs

Traceback (most recent call last): File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/venv/lib/python3.9/site-packages/ts/llm_launcher.py", line 286, in main(args) File "/home/venv/lib/python3.9/site-packages/ts/llm_launcher.py", line 174, in main with create_mar_file(args, model_snapshot_path): UnboundLocalError: local variable 'model_snapshot_path' referenced before assignment

Installation instructions

docker build . -f docker/Dockerfile.vllm -t ts/vllm
docker run --rm -ti --shm-size 1g --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -e VLLM_WORKER_MULTIPROC_METHOD=spawn -e enforce-eager=True -p 8080:8080 -v data:/data ts/vllm --model_id meta-llama/Meta-Llama-3.1-70B-Instruct --disable_token_auth

Model Packaging

I cloned the torchserve repo as is.

config.properties

No response

Versions

------------------------------------------------------------------------------------------
Environment headers
------------------------------------------------------------------------------------------
Torchserve branch: 

**Warning: torchserve not installed ..
**Warning: torch-model-archiver not installed ..

Python version: 3.12 (64-bit runtime)
Python executable: /home/paperspace/miniconda3/bin/python

Versions of relevant python libraries:
requests==2.32.3
wheel==0.43.0
**Warning: torch not present ..
**Warning: torchtext not present ..
**Warning: torchvision not present ..
**Warning: torchaudio not present ..

Java Version:

OS: Ubuntu 22.04.3 LTS
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: N/A
CMake version: version 3.28.20231121-g773fd7e

Environment:
library_path (LD_/DYLD_): /usr/local/cuda/lib64

Repro instructions

docker build . -f docker/Dockerfile.vllm -t ts/vllm
docker run --rm -ti --shm-size 1g --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -e VLLM_WORKER_MULTIPROC_METHOD=spawn -e enforce-eager=True -p 8080:8080 -v data:/data ts/vllm --model_id meta-llama/Meta-Llama-3.1-70B-Instruct --disable_token_auth

Possible Solution

model_snapshot_path should exist for vllm runtime: https://github.com/pytorch/serve/blob/6bdb1baf7254c2bfc40a6669b0d922c08ba5f37a/ts/llm_launcher.py#L171-L174

pytorch / serve