Closed paulovasconcellos-hotmart closed 1 year ago
Hi @paulovasconcellos-hotmart, thanks for reporting this. It's very weird since I can get correct results on my T4 GPU. I used the code you provided, but added gpu_memory_utilization=0.95
in initializing LLM
.
It worked @WoosukKwon ! Thank you very much for the quick reply
Hello @paulovasconcellos-hotmart , How did you solve this problem? I constantly got this error for Baichuan model. And I found it's caused by the single_query_cached_kv_attention method in vllm\model_executor\layers**attention.py. After calling of this method, the hidden output has some rows of "nan"**.
This is my code:
from vllm import LLM, SamplingParams
#from vllm.transformers_utils.configs.baichuan import BaiChuanConfig
prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
sampling_params = SamplingParams(temperature=1, top_p=0.95)
llm = LLM(
model="/home/jovyan/notebook-models-datavol-1/chatllama/llms/Baichuan/Baichuan-7b",
trust_remote_code=True,
dtype='float16',
gpu_memory_utilization=0.85,
tokenizer_mode="slow"
)
#llm = LLM(model="/home/jovyan/notebook-models-datavol-1/chatllama/llms/lmsyslongchat-13b-16k", trust_remote_code=True, gpu_memory_utilization=0.85)
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
and this is my python environment:
accelerate 0.21.0
aiofiles 23.1.0
aiohttp 3.8.5
aiosignal 1.3.1
altair 5.0.1
annotated-types 0.5.0
anyio 3.7.1
appdirs 1.4.4
argon2-cffi 21.3.0
argon2-cffi-bindings 21.2.0
arrow 1.2.3
asttokens 2.2.1
async-lru 2.0.3
async-timeout 4.0.2
attrs 23.1.0
Babel 2.12.1
backcall 0.2.0
beautifulsoup4 4.12.2
bleach 6.0.0
blinker 1.6.2
boltons 23.0.0
brotlipy 0.7.0
certifi 2022.12.7
cffi 1.15.1
charset-normalizer 2.0.4
click 8.1.6
cmake 3.27.0
comm 0.1.3
conda 23.3.1
conda-content-trust 0.1.3
conda-package-handling 2.0.2
conda_package_streaming 0.7.0
contourpy 1.1.0
cryptography 39.0.1
cycler 0.11.0
datasets 2.14.0
debugpy 1.6.7
decorator 5.1.1
defusedxml 0.7.1
dill 0.3.7
distlib 0.3.7
docker-pycreds 0.4.0
editables 0.5
exceptiongroup 1.1.2
executing 1.2.0
fastapi 0.100.0
fastjsonschema 2.18.0
ffmpy 0.3.1
filelock 3.12.2
Flask 2.3.2
fonttools 4.41.1
fqdn 1.5.1
frozenlist 1.4.0
fsspec 2023.6.0
gitdb 4.0.10
GitPython 3.1.32
gradio 3.35.2
gradio_client 0.2.10
grpcio 1.56.2
h11 0.14.0
hatchling 1.18.0
httpcore 0.17.3
httpx 0.24.1
huggingface-hub 0.16.4
idna 3.4
ipykernel 6.24.0
ipython 8.14.0
ipython-genutils 0.2.0
ipywidgets 8.0.7
isoduration 20.11.0
itsdangerous 2.1.2
jedi 0.18.2
jieba 0.42.1
Jinja2 3.1.2
joblib 1.3.1
json5 0.9.14
jsonpatch 1.32
jsonpointer 2.1
jsonschema 4.18.4
jsonschema-specifications 2023.7.1
jupyter 1.0.0
jupyter_client 8.3.0
jupyter-console 6.6.3
jupyter_core 5.3.1
jupyter-events 0.6.3
jupyter-lsp 2.2.0
jupyter_server 2.7.0
jupyter_server_terminals 0.4.4
jupyterlab 4.0.3
jupyterlab-pygments 0.2.2
jupyterlab_server 2.24.0
jupyterlab-widgets 3.0.8
kiwisolver 1.4.4
linkify-it-py 2.0.2
lit 16.0.6
markdown-it-py 2.2.0
markdown2 2.4.10
MarkupSafe 2.1.3
matplotlib 3.7.2
matplotlib-inline 0.1.6
mdit-py-plugins 0.3.3
mdurl 0.1.2
mistune 3.0.1
mpmath 1.3.0
msgpack 1.0.5
multidict 6.0.4
multiprocess 0.70.15
mypy-extensions 1.0.0
nbclient 0.8.0
nbconvert 7.7.2
nbformat 5.9.1
nest-asyncio 1.5.6
networkx 3.1
nh3 0.2.14
ninja 1.11.1
nltk 3.8.1
notebook 7.0.0
notebook_shim 0.2.3
numpy 1.25.1
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
nvidia-cufft-cu11 10.9.0.58
nvidia-curand-cu11 10.2.10.91
nvidia-cusolver-cu11 11.4.0.1
nvidia-cusparse-cu11 11.7.4.91
nvidia-nccl-cu11 2.14.3
nvidia-nvtx-cu11 11.7.91
orjson 3.9.2
overrides 7.3.1
packaging 23.0
pandas 2.0.3
pandocfilters 1.5.0
parso 0.8.3
pathspec 0.11.1
pathtools 0.1.2
peft 0.4.0
pexpect 4.8.0
pickleshare 0.7.5
Pillow 10.0.0
pip 23.0.1
platformdirs 3.9.1
pluggy 1.0.0
prometheus-client 0.17.1
prompt-toolkit 3.0.39
protobuf 4.23.4
psutil 5.9.5
ptyprocess 0.7.0
pure-eval 0.2.2
pyarrow 12.0.1
pycosat 0.6.4
pycparser 2.21
pydantic 1.10.12
pydantic_core 2.3.0
pydub 0.25.1
Pygments 2.15.1
pyOpenSSL 23.0.0
pyparsing 3.0.9
pyre-extensions 0.0.29
PySocks 1.7.1
python-dateutil 2.8.2
python-json-logger 2.0.7
python-multipart 0.0.6
pytz 2023.3
PyYAML 6.0.1
pyzmq 25.1.0
qtconsole 5.4.3
QtPy 2.3.1
ray 2.6.1
referencing 0.30.0
regex 2023.6.3
requests 2.28.1
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rich 13.4.2
rouge-chinese 1.0.3
rpds-py 0.9.2
ruamel.yaml 0.17.21
ruamel.yaml.clib 0.2.6
safetensors 0.3.1
semantic-version 2.10.0
Send2Trash 1.8.2
sentencepiece 0.1.99
sentry-sdk 1.28.1
setproctitle 1.3.2
setuptools 65.6.3
shortuuid 1.0.11
six 1.16.0
smmap 5.0.0
sniffio 1.3.0
soupsieve 2.4.1
stack-data 0.6.2
starlette 0.27.0
svgwrite 1.4.3
sympy 1.12
terminado 0.17.1
tinycss2 1.2.1
tokenizers 0.13.3
tomli 2.0.1
toolz 0.12.0
torch 2.0.1
tornado 6.3.2
tqdm 4.65.0
traitlets 5.9.0
transformers 4.31.0
triton 2.0.0
trl 0.4.7
trove-classifiers 2023.7.6
typing_extensions 4.7.1
typing-inspect 0.9.0
tzdata 2023.3
uc-micro-py 1.0.2
uri-template 1.3.0
urllib3 1.26.15
uvicorn 0.23.1
virtualenv 20.24.2
vllm 0.1.2 /home/jovyan/notebook-models-datavol-1/feng/OpenSource/vllm
wandb 0.15.7
wavedrom 2.0.3.post3
wcwidth 0.2.6
webcolors 1.13
webencodings 0.5.1
websocket-client 1.6.1
websockets 11.0.3
Werkzeug 2.3.6
wheel 0.38.4
widgetsnbextension 4.0.8
xformers 0.0.20
xxhash 3.2.0
yarl 1.9.2
zstandard 0.19.0
and my GPU info:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.73.08 Driver Version: 510.73.08 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GRID V100S-32Q On | 00000000:02:01.0 Off | 0 |
| N/A N/A P0 N/A / N/A | 0MiB / 32768MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Hi Everyone. I'm trying to use the fresh new MPT-7b included in vllm. I'm running on SageMaker Studio, in a g4dn.2xlarge instance, however, I'm getting the following error:
RuntimeError: probability tensor contains either
inf,
nanor element < 0
My code
This is my environment: