NotImplementedError: Cannot copy out of meta tensor; no data! + Models not generating output text

🐛 Describe the bug

When starting a server for text generation with the torchserve --ncs --start command (with for example the mistralai/Mistral-7B-Instruct-v0.2 model) I obtain the following error stacktrace right before instantiating the PipeStageExecutor stages, but having properly loaded the checkpoint shards.

2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - Traceback (most recent call last):
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/internal.py", line 207, in _run_function
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -     result = python_udf.func(*python_udf.args, **python_udf.kwargs)
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/rref_proxy.py", line 11, in _local_invoke
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -     return getattr(rref.local_value(), func_name)(*args, **kwargs)
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/pippy/PipelineDriver.py", line 282, in create_stage_executor
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -     mod=mod or Pipe.materialize_stage(mod_name),  # type: ignore[attr-defined]
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/pippy/IR.py", line 1105, in materialize_stage
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -     submodule.to(device)
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1173, in to
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -     return self._apply(convert)
2024-05-31T12:14:51,992 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 853, in _apply
2024-05-31T12:14:51,992 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -     self._buffers[key] = fn(buf)
2024-05-31T12:14:51,992 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1166, in convert
2024-05-31T12:14:51,992 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -     raise NotImplementedError(
2024-05-31T12:14:51,992 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

By looking online for similar instances of this error, it seems it can be raised due to needing more GPU memory, but this doesn't seem to be my case since none of my GPUs reach 50% at any point. But in other not-so-related instances of the same error, it seems it can also appear when trying to copy an empty tensor from one device to the other (e.g. see https://discuss.pytorch.org/t/how-to-convert-a-meta-tensor-to-normal-tensor/172136). So I have tried to follow in a way the advice from the error to use torch.nn.Module.to_empty() instead of torch.nn.Module.to(), and in line 1104 of the pippy/IR.py file I have modified it in the following way:

...
1104             try:
1105                 submodule.to(device)
1106             except NotImplementedError as e:
1107                 if str(e) == "Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.":
1108                     submodule.to_empty(device=device)
1109                 else:
1110                     raise
1111             except Exception:
...

By doing this, I'm able to bring up the server, but when making an inference request like curl -v "http://localhost:8080/predictions/mistral" -T sample_text.txt I don't get any new output in the generated text, it's just the same as the input. I have made sure that the input_ids are properly formatted, that the tokenizer encodes it in the proper way, that the model path is indeed found, and the output is also decoded properly. Therefore, I'm thinking that maybe this may have something to do with having added the to_empty method. Curiously, I have also tried with the meta-llama/Llama-2-7b-hf model, and in this case I do get some additional output, but it's just random tokens that don't make any sense, some of them not even being in English.

Error logs

2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - Traceback (most recent call last):
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/internal.py", line 207, in _run_function
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -     result = python_udf.func(*python_udf.args, **python_udf.kwargs)
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/rref_proxy.py", line 11, in _local_invoke
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -     return getattr(rref.local_value(), func_name)(*args, **kwargs)
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/pippy/PipelineDriver.py", line 282, in create_stage_executor
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -     mod=mod or Pipe.materialize_stage(mod_name),  # type: ignore[attr-defined]
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/pippy/IR.py", line 1105, in materialize_stage
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -     submodule.to(device)
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1173, in to
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -     return self._apply(convert)
2024-05-31T12:14:51,992 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 853, in _apply
2024-05-31T12:14:51,992 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -     self._buffers[key] = fn(buf)
2024-05-31T12:14:51,992 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1166, in convert
2024-05-31T12:14:51,992 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -     raise NotImplementedError(
2024-05-31T12:14:51,992 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

Installation instructions

Using the p3.8xlarge machine from AWS EC2, which has 4 TeslaV100 GPUs with a total of 64GB of GPU RAM, with AMI with Nvidia drivers ami-00b65ebfde51e11fb, and with a 120GB disk. Torchserve installed directly on machine by cloning the repo, installed needed packages, and downloaded model mistralai/Mistral-7B-Instruct-v0.2 via python ../utils/Download_model.py --model_name mistralai/Mistral-7B-Instruct-v0.2.

Model Packaging

Standard packaging from quickstart in https://github.com/pytorch/serve/blob/master/examples/large_models/Huggingface_pippy/Readme.md

config.properties

No response

Versions

Environment headers

Torchserve branch:

torchserve==0.11.0 torch-model-archiver==0.11.0

Python version: 3.10 (64-bit runtime) Python executable: /opt/conda/bin/python

Versions of relevant python libraries: captum==0.6.0 intel-extension-for-pytorch==2.3.0 numpy==1.24.3 nvgpu==0.10.0 pillow==10.3.0 psutil==5.9.8 pygit2==1.13.3 pylint==3.0.3 pytest==7.3.1 pytest-cov==4.1.0 pytest-mock==3.14.0 pytest-timeout==2.3.1 requests==2.32.0 requests-toolbelt==1.0.0 torch==2.3.0+cu121 torch-model-archiver==0.11.0 torch-workflow-archiver==0.2.13 torchaudio==2.3.0+cu121 torchpippy==0.1.1 torchserve==0.11.0 torchtext==0.18.0 torchvision==0.18.0+cu121 transformers==4.41.1 wheel==0.42.0 torch==2.3.0+cu121 torchtext==0.18.0 torchvision==0.18.0+cu121 torchaudio==2.3.0+cu121

Java Version:

OS: Ubuntu 20.04.6 LTS GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 Clang version: N/A CMake version: version 3.16.3

Is CUDA available: Yes CUDA runtime version: 12.1.105 GPU models and configuration: GPU 0: Tesla V100-SXM2-16GB GPU 1: Tesla V100-SXM2-16GB GPU 2: Tesla V100-SXM2-16GB GPU 3: Tesla V100-SXM2-16GB Nvidia driver version: 535.54.03 cuDNN version: None

Environment: librarypath (LD/DYLD_): /opt/amazon/efa/lib:/opt/amazon/openmpi/lib:/opt/aws-ofi-nccl/lib:/usr/local/cuda/lib:/usr/local/cuda/lib64:/usr/local/cuda:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/targets/x86_64-linux/lib:/usr/local/lib:/usr/lib

Repro instructions

From the /serve/examples/large_models/Huggingface_pippy directory I run torch-model-archiver --model-name mistral_model --version 1.0 --handler pippy_handler.py -r requirements.txt --config-file model-config.yaml --archive-format tgz, with the following model-config.yaml (which is similar to the one in the quickstart):

#frontend settings
minWorkers: 1
maxWorkers: 1
maxBatchDelay: 200
responseTimeout: 300
parallelType: "pp"
deviceType: "gpu"
torchrun:
    nproc-per-node: 4

#backend settings
pippy:
    rpc_timeout: 1800
    model_type: "HF"
    chunks: 1
    input_names: ["input_ids"]
    num_worker_threads: 256

handler:
    model_path: "/home/ubuntu/serve/examples/large_models/Huggingface_pippy/model/models--mistralai--Mistral-7B-Instruct-v0.2/snapshots/41b61a33a2483885c981aa79e0df6b32407ed873"
    index_filename: 'pytorch_model.bin.index.json'
    max_length: 80
    max_new_tokens: 100
    manual_seed: 41
    dtype: fp16

Once the tar.gz file is created I move it to model_store/ and start the server by running torchserve --ncs --start --model-store model_store --models mistral=mistral_model.tar.gz.

Possible Solution

No response

pytorch / serve