vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
27.01k stars 3.97k forks source link

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0 when running mpt-7b #363

Closed paulovasconcellos-hotmart closed 1 year ago

paulovasconcellos-hotmart commented 1 year ago

Hi Everyone. I'm trying to use the fresh new MPT-7b included in vllm. I'm running on SageMaker Studio, in a g4dn.2xlarge instance, however, I'm getting the following error:

RuntimeError: probability tensor contains eitherinf,nanor element < 0

My code

from vllm import LLM, SamplingParams

prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(model="mosaicml/mpt-7b", dtype='float16')

outputs = llm.generate(prompts, sampling_params) ### error happens here

# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

This is my environment:

accelerate @ file:///home/conda/feedstock_root/build_artifacts/accelerate_1683553934867/work
aiofiles==23.1.0
aiohttp==3.8.4
aiosignal==1.3.1
altair==5.0.1
anyio==3.7.0
apex @ file:///apex
appdirs==1.4.4
asttokens @ file:///home/conda/feedstock_root/build_artifacts/asttokens_1670263926556/work
async-timeout==4.0.2
attrs==22.2.0
awscli @ file:///home/conda/feedstock_root/build_artifacts/awscli_1683792289807/work
backcall @ file:///home/conda/feedstock_root/build_artifacts/backcall_1592338393461/work
backports.functools-lru-cache @ file:///home/conda/feedstock_root/build_artifacts/backports.functools_lru_cache_1618230623929/work
bcrypt==4.0.1
blis @ file:///home/conda/feedstock_root/build_artifacts/cython-blis_1668499088869/work
bokeh @ file:///home/conda/feedstock_root/build_artifacts/bokeh_1683730530224/work
boto3 @ file:///home/conda/feedstock_root/build_artifacts/boto3_1683763173043/work
botocore @ file:///home/conda/feedstock_root/build_artifacts/botocore_1683758921974/work
brotlipy @ file:///home/conda/feedstock_root/build_artifacts/brotlipy_1666764671472/work
cached-property @ file:///home/conda/feedstock_root/build_artifacts/cached_property_1615209429212/work
catalogue @ file:///home/conda/feedstock_root/build_artifacts/catalogue_1666891892909/work
certifi==2023.5.7
cffi @ file:///home/conda/feedstock_root/build_artifacts/cffi_1671179353105/work
charset-normalizer @ file:///home/conda/feedstock_root/build_artifacts/charset-normalizer_1678108872112/work
click @ file:///home/conda/feedstock_root/build_artifacts/click_1666798198223/work
cloudpickle @ file:///home/conda/feedstock_root/build_artifacts/cloudpickle_1674202310934/work
cmake==3.26.3
colorama @ file:///home/conda/feedstock_root/build_artifacts/colorama_1602866480661/work
comm @ file:///home/conda/feedstock_root/build_artifacts/comm_1679481329611/work
commonmark==0.9.1
conda==23.1.0
conda-content-trust @ file:///home/conda/feedstock_root/build_artifacts/conda-content-trust_1621370699668/work
conda-package-handling @ file:///home/conda/feedstock_root/build_artifacts/conda-package-handling_1669907009957/work
conda_package_streaming @ file:///home/conda/feedstock_root/build_artifacts/conda-package-streaming_1669733752472/work
confection @ file:///home/conda/feedstock_root/build_artifacts/confection_1673621475775/work
contextlib2==21.6.0
contourpy @ file:///home/conda/feedstock_root/build_artifacts/contourpy_1673633665736/work
cryptography @ file:///home/conda/feedstock_root/build_artifacts/cryptography-split_1679811212387/work
cycler @ file:///home/conda/feedstock_root/build_artifacts/cycler_1635519461629/work
cymem @ file:///home/conda/feedstock_root/build_artifacts/cymem_1666909672496/work
Cython @ file:///home/conda/feedstock_root/build_artifacts/cython_1680712295460/work
debugpy @ file:///home/conda/feedstock_root/build_artifacts/debugpy_1680755465990/work
decorator @ file:///home/conda/feedstock_root/build_artifacts/decorator_1641555617451/work
deepspeed @ https://aws-deepspeed-zero-2d-binaries.s3.us-west-2.amazonaws.com/r2.0.0/20230407-184728/1ea3d4b6aa41fe66277daacbb78b3743a310d85a/deepspeed-0.6.1%2B1ea3d4b-py3-none-any.whl#sha256=f59834b5a39738f4f180757dbe8550dfd8fbcc97cd863bc0ee362d1ab81e3873
dgl==1.1.0+cu118
dill==0.3.6
docker-pycreds==0.4.0
docutils @ file:///home/conda/feedstock_root/build_artifacts/docutils_1667993608396/work
einops==0.6.1
exceptiongroup==1.1.2
executing @ file:///home/conda/feedstock_root/build_artifacts/executing_1667317341051/work
fastai @ file:///home/jhoward/mambaforge/conda-bld/fastai_1680035345463/work
fastapi==0.99.1
fastcore @ file:///home/jhoward/mambaforge/conda-bld/fastcore_1680034914245/work
fastdownload @ file:///home/jhoward/mambaforge/conda-bld/fastdownload_1657219113869/work
fastprogress @ file:///home/jhoward/mambaforge/conda-bld/fastprogress_1658473398631/work
ffmpy==0.3.0
filelock @ file:///home/conda/feedstock_root/build_artifacts/filelock_1681839547898/work
flash-attn==0.2.8
fonttools @ file:///home/conda/feedstock_root/build_artifacts/fonttools_1683740454859/work
frozenlist==1.3.3
fschat==0.2.17
fsspec==2023.5.0
future @ file:///home/conda/feedstock_root/build_artifacts/future_1673596611778/work
gevent==22.10.2
gitdb==4.0.10
GitPython==3.1.31
gmpy2 @ file:///home/conda/feedstock_root/build_artifacts/gmpy2_1666808654411/work
google-pasta==0.2.0
gradio==3.35.2
gradio_client==0.2.7
greenlet==2.0.2
grpcio==1.51.3
h11==0.14.0
h5py @ file:///home/conda/feedstock_root/build_artifacts/h5py_1675704794369/work
hjson==3.1.0
horovod==0.26.1
httpcore==0.17.2
httpx==0.24.1
huggingface-hub==0.15.1
idna @ file:///home/conda/feedstock_root/build_artifacts/idna_1663625384323/work
imageio @ file:///home/conda/feedstock_root/build_artifacts/imageio_1683031833737/work
importlib-metadata==4.13.0
inotify-simple==1.2.1
ipykernel @ file:///home/conda/feedstock_root/build_artifacts/ipykernel_1683553336538/work
ipython @ file:///home/conda/feedstock_root/build_artifacts/ipython_1683225895562/work
jedi @ file:///home/conda/feedstock_root/build_artifacts/jedi_1669134318875/work
Jinja2 @ file:///home/conda/feedstock_root/build_artifacts/jinja2_1654302431367/work
jmespath @ file:///home/conda/feedstock_root/build_artifacts/jmespath_1655568249366/work
joblib @ file:///home/conda/feedstock_root/build_artifacts/joblib_1663332044897/work
jsonpatch==1.32
jsonpointer==2.3
jsonschema==4.17.3
jupyter_client @ file:///home/conda/feedstock_root/build_artifacts/jupyter_client_1681432441054/work
jupyter_core @ file:///home/conda/feedstock_root/build_artifacts/jupyter_core_1678994169527/work
kiwisolver @ file:///home/conda/feedstock_root/build_artifacts/kiwisolver_1666805701884/work
langcodes @ file:///home/conda/feedstock_root/build_artifacts/langcodes_1636741340529/work
libmambapy @ file:///home/conda/feedstock_root/build_artifacts/mamba-split_1680002410624/work/libmambapy
linkify-it-py==2.0.2
lit==16.0.3
llvmlite==0.39.1
mamba @ file:///home/conda/feedstock_root/build_artifacts/mamba-split_1680002410624/work/mamba
markdown-it-py==2.2.0
markdown2==2.4.9
MarkupSafe @ file:///home/conda/feedstock_root/build_artifacts/markupsafe_1674135787083/work
matplotlib @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-suite_1678135565516/work
matplotlib-inline @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-inline_1660814786464/work
mdit-py-plugins==0.3.3
mdurl==0.1.2
mpi4py @ file:///home/conda/feedstock_root/build_artifacts/mpi4py_1667459939419/work
mpmath @ file:///home/conda/feedstock_root/build_artifacts/mpmath_1678228039184/work
msgpack==1.0.5
multidict==6.0.4
multiprocess==0.70.14
munkres==1.1.4
murmurhash @ file:///home/conda/feedstock_root/build_artifacts/murmurhash_1666946151787/work
mypy-extensions==1.0.0
nest-asyncio @ file:///home/conda/feedstock_root/build_artifacts/nest-asyncio_1664684991461/work
networkx @ file:///home/conda/feedstock_root/build_artifacts/networkx_1680692919326/work
nh3==0.2.13
ninja==1.11.1
numba @ file:///home/conda/feedstock_root/build_artifacts/numba_1680825379968/work
numpy @ file:///home/conda/feedstock_root/build_artifacts/numpy_1668919096861/work
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.2.10.91
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusparse-cu11==11.7.4.91
nvidia-nccl-cu11==2.14.3
nvidia-nvtx-cu11==11.7.91
opencv-python==4.7.0
orjson==3.9.1
packaging @ file:///home/conda/feedstock_root/build_artifacts/packaging_1681337016113/work
pandas @ file:///home/conda/feedstock_root/build_artifacts/pandas_1683493925851/work
paramiko==3.1.0
parso @ file:///home/conda/feedstock_root/build_artifacts/parso_1638334955874/work
pathos==0.3.0
pathtools==0.1.2
pathy @ file:///home/conda/feedstock_root/build_artifacts/pathy_1670689864140/work
patsy @ file:///home/conda/feedstock_root/build_artifacts/patsy_1665356157073/work
peft==0.3.0
pexpect @ file:///home/conda/feedstock_root/build_artifacts/pexpect_1667297516076/work
pickleshare @ file:///home/conda/feedstock_root/build_artifacts/pickleshare_1602536217715/work
Pillow @ file:///home/conda/feedstock_root/build_artifacts/pillow_1675487172403/work
platformdirs @ file:///home/conda/feedstock_root/build_artifacts/platformdirs_1682644429438/work
plotly @ file:///home/conda/feedstock_root/build_artifacts/plotly_1680731398751/work
pluggy @ file:///home/conda/feedstock_root/build_artifacts/pluggy_1667232663820/work
ply==3.11
pooch @ file:///home/conda/feedstock_root/build_artifacts/pooch_1679580333621/work
pox==0.3.2
ppft==1.7.6.6
preshed @ file:///home/conda/feedstock_root/build_artifacts/preshed_1666991224827/work
prompt-toolkit @ file:///home/conda/feedstock_root/build_artifacts/prompt-toolkit_1677600924538/work
protobuf==3.20.3
protobuf3-to-dict==0.1.5
psutil @ file:///home/conda/feedstock_root/build_artifacts/psutil_1681775027942/work
ptyprocess @ file:///home/conda/feedstock_root/build_artifacts/ptyprocess_1609419310487/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl
pure-eval @ file:///home/conda/feedstock_root/build_artifacts/pure_eval_1642875951954/work
py-cpuinfo==9.0.0
pyarrow==12.0.0
pyasn1==0.4.8
pybind11 @ file:///home/conda/feedstock_root/build_artifacts/pybind11-split_1679012409253/work
pybind11-global @ file:///home/conda/feedstock_root/build_artifacts/pybind11-split_1679012409253/work
pycosat @ file:///home/conda/feedstock_root/build_artifacts/pycosat_1666836542287/work
pycparser @ file:///home/conda/feedstock_root/build_artifacts/pycparser_1636257122734/work
pydantic @ file:///home/conda/feedstock_root/build_artifacts/pydantic_1679565261911/work
pydub==0.25.1
pyfunctional==1.4.3
Pygments @ file:///home/conda/feedstock_root/build_artifacts/pygments_1681904169130/work
pyinstrument==3.4.2
pyinstrument-cext==0.2.4
PyNaCl==1.5.0
pyOpenSSL @ file:///home/conda/feedstock_root/build_artifacts/pyopenssl_1680037383858/work
pyparsing @ file:///home/conda/feedstock_root/build_artifacts/pyparsing_1652235407899/work
PyQt5==5.15.7
PyQt5-sip==12.11.0
pyre-extensions==0.0.29
pyrsistent==0.19.3
PySocks @ file:///home/conda/feedstock_root/build_artifacts/pysocks_1661604839144/work
python-dateutil @ file:///home/conda/feedstock_root/build_artifacts/python-dateutil_1626286286081/work
python-multipart==0.0.6
pytz @ file:///home/conda/feedstock_root/build_artifacts/pytz_1680088766131/work
PyYAML @ file:///home/conda/feedstock_root/build_artifacts/pyyaml_1668001474078/work
pyzmq @ file:///home/conda/feedstock_root/build_artifacts/pyzmq_1679316826707/work
ray==2.5.1
regex==2023.6.3
requests @ file:///home/conda/feedstock_root/build_artifacts/requests_1680286922386/work
retrying==1.3.4
rich @ file:///home/conda/feedstock_root/build_artifacts/rich_1664752510089/work
rsa @ file:///home/conda/feedstock_root/build_artifacts/rsa_1614171254180/work
ruamel.yaml @ file:///home/conda/feedstock_root/build_artifacts/ruamel.yaml_1678272977710/work
ruamel.yaml.clib @ file:///home/conda/feedstock_root/build_artifacts/ruamel.yaml.clib_1670412719074/work
s3fs==0.4.2
s3transfer @ file:///home/conda/feedstock_root/build_artifacts/s3transfer_1683241957497/work
sagemaker==2.154.0
sagemaker-experiments==0.1.43
sagemaker-pytorch-training==2.8.0
sagemaker-training==4.5.0
schema==0.7.5
scikit-learn @ file:///home/conda/feedstock_root/build_artifacts/scikit-learn_1679675836718/work
scipy @ file:///home/conda/feedstock_root/build_artifacts/scipy_1683719288579/work/dist/scipy-1.10.1-cp310-cp310-linux_x86_64.whl#sha256=eeee39d8a01a8072da1efa959a1490fe1e94114fa147125c39cf3c438f69ca54
seaborn @ file:///home/conda/feedstock_root/build_artifacts/seaborn-split_1672497695270/work
semantic-version==2.10.0
sentencepiece==0.1.99
sentry-sdk==1.27.0
setproctitle==1.3.2
shap @ file:///home/conda/feedstock_root/build_artifacts/shap_1655716950751/work
shellingham @ file:///home/conda/feedstock_root/build_artifacts/shellingham_1676292972954/work
shortuuid==1.0.11
sip @ file:///home/conda/feedstock_root/build_artifacts/sip_1681995008230/work
six @ file:///home/conda/feedstock_root/build_artifacts/six_1620240208055/work
slicer @ file:///home/conda/feedstock_root/build_artifacts/slicer_1608146800664/work
smart-open @ file:///home/conda/feedstock_root/build_artifacts/smart_open_1630238320325/work
smclarify==0.5
smdebug @ file:///tmp/sagemaker-debugger
smdebug-rulesconfig==1.0.1
smdistributed-dataparallel @ https://smdataparallel.s3.amazonaws.com/binary/pytorch/2.0.0/cu118/2023-03-20/smdistributed_dataparallel-1.8.0-cp310-cp310-linux_x86_64.whl#sha256=4952b8de26aaa2ed51b8e668f68be4abd0bf1b35378e979561d872acba31ecd3
smdistributed-modelparallel @ https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-2.0.0/build-artifacts/2023-04-14-20-14/smdistributed_modelparallel-1.15.0-cp310-cp310-linux_x86_64.whl#sha256=5a772776a6a280581e452208c62d8ca20b0f6d4d2c59ec294f5402dc5b89b1f1
smmap==5.0.0
sniffio==1.3.0
spacy @ file:///home/conda/feedstock_root/build_artifacts/spacy_1681807679135/work
spacy-legacy @ file:///home/conda/feedstock_root/build_artifacts/spacy-legacy_1674550301837/work
spacy-loggers @ file:///home/conda/feedstock_root/build_artifacts/spacy-loggers_1672303484730/work
srsly @ file:///home/conda/feedstock_root/build_artifacts/srsly_1677657434449/work
stack-data @ file:///home/conda/feedstock_root/build_artifacts/stack_data_1669632077133/work
starlette==0.27.0
statsmodels @ file:///home/conda/feedstock_root/build_artifacts/statsmodels_1683305553485/work
svgwrite==1.4.3
sympy @ file:///home/conda/feedstock_root/build_artifacts/sympy_1679342590084/work
tabulate==0.9.0
tblib==1.7.0
tenacity @ file:///home/conda/feedstock_root/build_artifacts/tenacity_1677600641219/work
thinc @ file:///home/conda/feedstock_root/build_artifacts/thinc_1683130983739/work
threadpoolctl @ file:///home/conda/feedstock_root/build_artifacts/threadpoolctl_1643647933166/work
tiktoken==0.4.0
tokenizers==0.13.3
toml @ file:///home/conda/feedstock_root/build_artifacts/toml_1604308577558/work
tomli @ file:///home/conda/feedstock_root/build_artifacts/tomli_1644342247877/work
toolz @ file:///home/conda/feedstock_root/build_artifacts/toolz_1657485559105/work
torch==2.0.1
torchaudio==2.0.1
torchdata @ file:///opt/conda/conda-bld/torchdata_1679615656247/work
torchnet==0.0.4
torchtext==0.15.1
torchvision==0.15.1
tornado @ file:///home/conda/feedstock_root/build_artifacts/tornado_1681817446549/work
tqdm @ file:///home/conda/feedstock_root/build_artifacts/tqdm_1677948868469/work
traitlets @ file:///home/conda/feedstock_root/build_artifacts/traitlets_1675110562325/work
transformers==4.28.1
triton==2.0.0
typer @ file:///home/conda/feedstock_root/build_artifacts/typer_1667832226065/work
typing-inspect==0.9.0
typing_extensions @ file:///home/conda/feedstock_root/build_artifacts/typing_extensions_1678559861143/work
tzdata @ file:///home/conda/feedstock_root/build_artifacts/python-tzdata_1680081134351/work
uc-micro-py==1.0.2
unicodedata2 @ file:///home/conda/feedstock_root/build_artifacts/unicodedata2_1667239886688/work
urllib3 @ file:///home/conda/feedstock_root/build_artifacts/urllib3_1678635778344/work
uvicorn==0.22.0
visdom==0.2.4
vllm @ git+https://github.com/vllm-project/vllm.git@98fe8cb5420c28fa8dcc3110b6c898848dd57e45
wandb==0.15.4
wasabi @ file:///home/conda/feedstock_root/build_artifacts/wasabi_1673945962927/work
wavedrom==2.0.3.post3
wcwidth @ file:///home/conda/feedstock_root/build_artifacts/wcwidth_1673864653149/work
websocket-client==1.5.1
websockets==11.0.3
Werkzeug==2.3.4
xformers @ git+https://github.com/facebookresearch/xformers@1f449ef81680707d38e1739c627b5bffee7732c6
xyzservices @ file:///home/conda/feedstock_root/build_artifacts/xyzservices_1676835466992/work
yarl==1.9.2
zipp @ file:///home/conda/feedstock_root/build_artifacts/zipp_1677313463193/work
zope.event==4.6
zope.interface==6.0
zstandard==0.19.0
WoosukKwon commented 1 year ago

Hi @paulovasconcellos-hotmart, thanks for reporting this. It's very weird since I can get correct results on my T4 GPU. I used the code you provided, but added gpu_memory_utilization=0.95 in initializing LLM.

paulovasconcellos-hotmart commented 1 year ago

It worked @WoosukKwon ! Thank you very much for the quick reply

jinfengfeng commented 1 year ago

Hello @paulovasconcellos-hotmart , How did you solve this problem? I constantly got this error for Baichuan model. And I found it's caused by the single_query_cached_kv_attention method in vllm\model_executor\layers**attention.py. After calling of this method, the hidden output has some rows of "nan"**.

This is my code:

from vllm import LLM, SamplingParams
#from vllm.transformers_utils.configs.baichuan import BaiChuanConfig

prompts = [
        "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = SamplingParams(temperature=1, top_p=0.95)

llm = LLM(
        model="/home/jovyan/notebook-models-datavol-1/chatllama/llms/Baichuan/Baichuan-7b",
        trust_remote_code=True,
        dtype='float16',
        gpu_memory_utilization=0.85,
        tokenizer_mode="slow"
    )
#llm = LLM(model="/home/jovyan/notebook-models-datavol-1/chatllama/llms/lmsyslongchat-13b-16k", trust_remote_code=True, gpu_memory_utilization=0.85)
outputs = llm.generate(prompts, sampling_params)

for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

and this is my python environment:

accelerate                0.21.0
aiofiles                  23.1.0
aiohttp                   3.8.5
aiosignal                 1.3.1
altair                    5.0.1
annotated-types           0.5.0
anyio                     3.7.1
appdirs                   1.4.4
argon2-cffi               21.3.0
argon2-cffi-bindings      21.2.0
arrow                     1.2.3
asttokens                 2.2.1
async-lru                 2.0.3
async-timeout             4.0.2
attrs                     23.1.0
Babel                     2.12.1
backcall                  0.2.0
beautifulsoup4            4.12.2
bleach                    6.0.0
blinker                   1.6.2
boltons                   23.0.0
brotlipy                  0.7.0
certifi                   2022.12.7
cffi                      1.15.1
charset-normalizer        2.0.4
click                     8.1.6
cmake                     3.27.0
comm                      0.1.3
conda                     23.3.1
conda-content-trust       0.1.3
conda-package-handling    2.0.2
conda_package_streaming   0.7.0
contourpy                 1.1.0
cryptography              39.0.1
cycler                    0.11.0
datasets                  2.14.0
debugpy                   1.6.7
decorator                 5.1.1
defusedxml                0.7.1
dill                      0.3.7
distlib                   0.3.7
docker-pycreds            0.4.0
editables                 0.5
exceptiongroup            1.1.2
executing                 1.2.0
fastapi                   0.100.0
fastjsonschema            2.18.0
ffmpy                     0.3.1
filelock                  3.12.2
Flask                     2.3.2
fonttools                 4.41.1
fqdn                      1.5.1
frozenlist                1.4.0
fsspec                    2023.6.0
gitdb                     4.0.10
GitPython                 3.1.32
gradio                    3.35.2
gradio_client             0.2.10
grpcio                    1.56.2
h11                       0.14.0
hatchling                 1.18.0
httpcore                  0.17.3
httpx                     0.24.1
huggingface-hub           0.16.4
idna                      3.4
ipykernel                 6.24.0
ipython                   8.14.0
ipython-genutils          0.2.0
ipywidgets                8.0.7
isoduration               20.11.0
itsdangerous              2.1.2
jedi                      0.18.2
jieba                     0.42.1
Jinja2                    3.1.2
joblib                    1.3.1
json5                     0.9.14
jsonpatch                 1.32
jsonpointer               2.1
jsonschema                4.18.4
jsonschema-specifications 2023.7.1
jupyter                   1.0.0
jupyter_client            8.3.0
jupyter-console           6.6.3
jupyter_core              5.3.1
jupyter-events            0.6.3
jupyter-lsp               2.2.0
jupyter_server            2.7.0
jupyter_server_terminals  0.4.4
jupyterlab                4.0.3
jupyterlab-pygments       0.2.2
jupyterlab_server         2.24.0
jupyterlab-widgets        3.0.8
kiwisolver                1.4.4
linkify-it-py             2.0.2
lit                       16.0.6
markdown-it-py            2.2.0
markdown2                 2.4.10
MarkupSafe                2.1.3
matplotlib                3.7.2
matplotlib-inline         0.1.6
mdit-py-plugins           0.3.3
mdurl                     0.1.2
mistune                   3.0.1
mpmath                    1.3.0
msgpack                   1.0.5
multidict                 6.0.4
multiprocess              0.70.15
mypy-extensions           1.0.0
nbclient                  0.8.0
nbconvert                 7.7.2
nbformat                  5.9.1
nest-asyncio              1.5.6
networkx                  3.1
nh3                       0.2.14
ninja                     1.11.1
nltk                      3.8.1
notebook                  7.0.0
notebook_shim             0.2.3
numpy                     1.25.1
nvidia-cublas-cu11        11.10.3.66
nvidia-cuda-cupti-cu11    11.7.101
nvidia-cuda-nvrtc-cu11    11.7.99
nvidia-cuda-runtime-cu11  11.7.99
nvidia-cudnn-cu11         8.5.0.96
nvidia-cufft-cu11         10.9.0.58
nvidia-curand-cu11        10.2.10.91
nvidia-cusolver-cu11      11.4.0.1
nvidia-cusparse-cu11      11.7.4.91
nvidia-nccl-cu11          2.14.3
nvidia-nvtx-cu11          11.7.91
orjson                    3.9.2
overrides                 7.3.1
packaging                 23.0
pandas                    2.0.3
pandocfilters             1.5.0
parso                     0.8.3
pathspec                  0.11.1
pathtools                 0.1.2
peft                      0.4.0
pexpect                   4.8.0
pickleshare               0.7.5
Pillow                    10.0.0
pip                       23.0.1
platformdirs              3.9.1
pluggy                    1.0.0
prometheus-client         0.17.1
prompt-toolkit            3.0.39
protobuf                  4.23.4
psutil                    5.9.5
ptyprocess                0.7.0
pure-eval                 0.2.2
pyarrow                   12.0.1
pycosat                   0.6.4
pycparser                 2.21
pydantic                  1.10.12
pydantic_core             2.3.0
pydub                     0.25.1
Pygments                  2.15.1
pyOpenSSL                 23.0.0
pyparsing                 3.0.9
pyre-extensions           0.0.29
PySocks                   1.7.1
python-dateutil           2.8.2
python-json-logger        2.0.7
python-multipart          0.0.6
pytz                      2023.3
PyYAML                    6.0.1
pyzmq                     25.1.0
qtconsole                 5.4.3
QtPy                      2.3.1
ray                       2.6.1
referencing               0.30.0
regex                     2023.6.3
requests                  2.28.1
rfc3339-validator         0.1.4
rfc3986-validator         0.1.1
rich                      13.4.2
rouge-chinese             1.0.3
rpds-py                   0.9.2
ruamel.yaml               0.17.21
ruamel.yaml.clib          0.2.6
safetensors               0.3.1
semantic-version          2.10.0
Send2Trash                1.8.2
sentencepiece             0.1.99
sentry-sdk                1.28.1
setproctitle              1.3.2
setuptools                65.6.3
shortuuid                 1.0.11
six                       1.16.0
smmap                     5.0.0
sniffio                   1.3.0
soupsieve                 2.4.1
stack-data                0.6.2
starlette                 0.27.0
svgwrite                  1.4.3
sympy                     1.12
terminado                 0.17.1
tinycss2                  1.2.1
tokenizers                0.13.3
tomli                     2.0.1
toolz                     0.12.0
torch                     2.0.1
tornado                   6.3.2
tqdm                      4.65.0
traitlets                 5.9.0
transformers              4.31.0
triton                    2.0.0
trl                       0.4.7
trove-classifiers         2023.7.6
typing_extensions         4.7.1
typing-inspect            0.9.0
tzdata                    2023.3
uc-micro-py               1.0.2
uri-template              1.3.0
urllib3                   1.26.15
uvicorn                   0.23.1
virtualenv                20.24.2
vllm                      0.1.2       /home/jovyan/notebook-models-datavol-1/feng/OpenSource/vllm
wandb                     0.15.7
wavedrom                  2.0.3.post3
wcwidth                   0.2.6
webcolors                 1.13
webencodings              0.5.1
websocket-client          1.6.1
websockets                11.0.3
Werkzeug                  2.3.6
wheel                     0.38.4
widgetsnbextension        4.0.8
xformers                  0.0.20
xxhash                    3.2.0
yarl                      1.9.2
zstandard                 0.19.0

and my GPU info:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.73.08    Driver Version: 510.73.08    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GRID V100S-32Q      On   | 00000000:02:01.0 Off |                    0 |
| N/A   N/A    P0    N/A /  N/A |      0MiB / 32768MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+