mistralai / mistral-inference

Official inference library for Mistral models
https://mistral.ai/
Apache License 2.0
9.74k stars 863 forks source link

[BUG] Transformer.from_folder() does not load the model on multiple GPU #197

Open Cerrix opened 4 months ago

Cerrix commented 4 months ago

Python -VV

Python 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]

Pip Freeze

absl-py @ file:///home/conda/feedstock_root/build_artifacts/absl-py_1705494584803/work
accelerate==0.32.1
aiobotocore @ file:///home/conda/feedstock_root/build_artifacts/aiobotocore_1719300089447/work
aiohttp @ file:///home/conda/feedstock_root/build_artifacts/aiohttp_1713964853148/work
aioitertools @ file:///home/conda/feedstock_root/build_artifacts/aioitertools_1663521246073/work
aiosignal @ file:///home/conda/feedstock_root/build_artifacts/aiosignal_1667935791922/work
aiosqlite @ file:///home/conda/feedstock_root/build_artifacts/aiosqlite_1682491975081/work
altair @ file:///home/conda/feedstock_root/build_artifacts/altair-split_1711824856061/work
amazon-q-developer-jupyterlab-ext @ file:///home/conda/feedstock_root/build_artifacts/amazon-q-developer-jupyterlab-ext_1718654116006/work
amazon-sagemaker-sql-editor @ file:///home/conda/feedstock_root/build_artifacts/amazon_sagemaker_sql_editor_1718780416454/work
amazon-sagemaker-sql-execution @ file:///home/conda/feedstock_root/build_artifacts/amazon-sagemaker-sql-execution_1713045672980/work
amazon-sagemaker-sql-magic @ file:///home/conda/feedstock_root/build_artifacts/amazon-sagemaker-sql-magic_1718780032766/work
amazon_sagemaker_jupyter_ai_q_developer @ file:///home/conda/feedstock_root/build_artifacts/amazon-sagemaker-jupyter-ai-q-developer_1718767262338/work
amazon_sagemaker_jupyter_scheduler @ file:///home/conda/feedstock_root/build_artifacts/amazon-sagemaker-jupyter-scheduler_1718040900290/work
annotated-types==0.7.0
ansi2html @ file:///home/conda/feedstock_root/build_artifacts/ansi2html_1703532389991/work
ansicolors @ file:///home/conda/feedstock_root/build_artifacts/ansicolors_1661653730566/work
antlr4-python3-runtime @ file:///home/conda/feedstock_root/build_artifacts/antlr-python-runtime-meta_1638309185939/work
anyio @ file:///home/conda/feedstock_root/build_artifacts/anyio_1717693030552/work
archspec @ file:///home/conda/feedstock_root/build_artifacts/archspec_1708969572489/work
argon2-cffi @ file:///home/conda/feedstock_root/build_artifacts/argon2-cffi_1692818318753/work
argon2-cffi-bindings @ file:///home/conda/feedstock_root/build_artifacts/argon2-cffi-bindings_1695386546427/work
arrow @ file:///home/conda/feedstock_root/build_artifacts/arrow_1696128962909/work
asn1crypto @ file:///home/conda/feedstock_root/build_artifacts/asn1crypto_1647369152656/work
astroid @ file:///home/conda/feedstock_root/build_artifacts/astroid_1716193690567/work
asttokens @ file:///home/conda/feedstock_root/build_artifacts/asttokens_1698341106958/work
astunparse @ file:///home/conda/feedstock_root/build_artifacts/astunparse_1610696312422/work
async-lru @ file:///home/conda/feedstock_root/build_artifacts/async-lru_1690563019058/work
async-timeout @ file:///home/conda/feedstock_root/build_artifacts/async-timeout_1691763562544/work
attrs @ file:///home/conda/feedstock_root/build_artifacts/attrs_1704011227531/work
autogluon @ file:///home/conda/feedstock_root/build_artifacts/autogluon_1714773683092/work/autogluon
autogluon.common @ file:///home/conda/feedstock_root/build_artifacts/autogluon.common_1714694238466/work/common
autogluon.core @ file:///home/conda/feedstock_root/build_artifacts/autogluon.core_1714752967231/work/core
autogluon.features @ file:///home/conda/feedstock_root/build_artifacts/autogluon.features_1714697569882/work/features
autogluon.multimodal @ file:///home/conda/feedstock_root/build_artifacts/autogluon.multimodal_1714763188950/work/multimodal
autogluon.tabular @ file:///home/conda/feedstock_root/build_artifacts/autogluon.tabular_1714759714895/work/tabular
autogluon.timeseries @ file:///home/conda/feedstock_root/build_artifacts/autogluon.timeseries_1714764748604/work/timeseries
autopep8 @ file:///home/conda/feedstock_root/build_artifacts/autopep8_1693061251004/work
autovizwidget @ file:///home/conda/feedstock_root/build_artifacts/autovizwidget_1694633627542/work
aws-embedded-metrics @ file:///home/conda/feedstock_root/build_artifacts/aws-embedded-metrics_1696355834319/work
aws-glue-sessions @ file:///home/conda/feedstock_root/build_artifacts/aws-glue-sessions_1716530167930/work
Babel @ file:///home/conda/feedstock_root/build_artifacts/babel_1702422572539/work
bcrypt @ file:///home/conda/feedstock_root/build_artifacts/bcrypt_1715971615809/work
beautifulsoup4 @ file:///home/conda/feedstock_root/build_artifacts/beautifulsoup4_1705564648255/work
binaryornot==0.4.4
bleach @ file:///home/conda/feedstock_root/build_artifacts/bleach_1696630167146/work
blinker @ file:///home/conda/feedstock_root/build_artifacts/blinker_1715091184126/work
blis @ file:///home/conda/feedstock_root/build_artifacts/cython-blis_1696148805003/work
boltons @ file:///home/conda/feedstock_root/build_artifacts/boltons_1711936407380/work
boto3 @ file:///home/conda/feedstock_root/build_artifacts/boto3_1718954179215/work
botocore @ file:///home/conda/feedstock_root/build_artifacts/botocore_1718937901137/work
Brotli @ file:///home/conda/feedstock_root/build_artifacts/brotli-split_1687884021435/work
cached-property @ file:///home/conda/feedstock_root/build_artifacts/cached_property_1615209429212/work
cachetools @ file:///home/conda/feedstock_root/build_artifacts/cachetools_1708987703938/work
catalogue @ file:///home/conda/feedstock_root/build_artifacts/catalogue_1695626339626/work
catboost @ https://pypi.org/packages/cp310/c/catboost/catboost-1.2.5-cp310-cp310-manylinux2014_x86_64.whl#sha256=a92da61e95919b03d611045f0f3373799deec6f8192d7d1211714a21ff0da65e
certifi @ file:///home/conda/feedstock_root/build_artifacts/certifi_1718025014955/work/certifi
cffi @ file:///home/conda/feedstock_root/build_artifacts/cffi_1696001684923/work
chardet @ file:///home/conda/feedstock_root/build_artifacts/chardet_1695468598188/work
charset-normalizer @ file:///home/conda/feedstock_root/build_artifacts/charset-normalizer_1698833585322/work
click @ file:///home/conda/feedstock_root/build_artifacts/click_1692311806742/work
cloudpathlib @ file:///home/conda/feedstock_root/build_artifacts/cloudpathlib-meta_1697837790453/work
cloudpickle @ file:///home/conda/feedstock_root/build_artifacts/cloudpickle_1674202310934/work
colorama @ file:///home/conda/feedstock_root/build_artifacts/colorama_1666700638685/work
comm @ file:///home/conda/feedstock_root/build_artifacts/comm_1710320294760/work
conda @ file:///home/conda/feedstock_root/build_artifacts/conda_1701731572133/work
conda-libmamba-solver @ file:///home/conda/feedstock_root/build_artifacts/conda-libmamba-solver_1706566000184/work/src
conda-package-handling @ file:///home/conda/feedstock_root/build_artifacts/conda-package-handling_1717678605937/work
conda_package_streaming @ file:///home/conda/feedstock_root/build_artifacts/conda-package-streaming_1717678526951/work
confection @ file:///home/conda/feedstock_root/build_artifacts/confection_1701179074719/work
contextlib2 @ file:///home/conda/feedstock_root/build_artifacts/contextlib2_1624848568296/work
contourpy @ file:///home/conda/feedstock_root/build_artifacts/contourpy_1712429905637/work
cookiecutter @ file:///home/conda/feedstock_root/build_artifacts/cookiecutter_1708608886262/work
croniter @ file:///home/conda/feedstock_root/build_artifacts/croniter_1686929181238/work
cryptography @ file:///home/conda/feedstock_root/build_artifacts/cryptography-split_1717559422169/work
cycler @ file:///home/conda/feedstock_root/build_artifacts/cycler_1696677705766/work
cymem @ file:///home/conda/feedstock_root/build_artifacts/cymem_1695443485440/work
cytoolz @ file:///home/conda/feedstock_root/build_artifacts/cytoolz_1706897049115/work
dash @ file:///home/conda/feedstock_root/build_artifacts/dash_1718251139044/work
dask @ file:///home/conda/feedstock_root/build_artifacts/dask-core_1718917774609/work
dataclasses-json @ file:///home/conda/feedstock_root/build_artifacts/dataclasses-json_1717969336599/work
datasets @ file:///home/conda/feedstock_root/build_artifacts/datasets_1717425882640/work
debugpy @ file:///home/conda/feedstock_root/build_artifacts/debugpy_1707444420542/work
decorator @ file:///home/conda/feedstock_root/build_artifacts/decorator_1641555617451/work
deepmerge @ file:///home/conda/feedstock_root/build_artifacts/deepmerge_1702941685750/work
defusedxml @ file:///home/conda/feedstock_root/build_artifacts/defusedxml_1615232257335/work
dill @ file:///home/conda/feedstock_root/build_artifacts/dill_1706434688412/work
diskcache @ file:///home/conda/feedstock_root/build_artifacts/diskcache_1693471346238/work
distributed @ file:///home/conda/feedstock_root/build_artifacts/distributed_1718922226295/work
distro @ file:///home/conda/feedstock_root/build_artifacts/distro_1704321475663/work
docker @ file:///home/conda/feedstock_root/build_artifacts/docker-py_1716508870406/work
docstring-to-markdown @ file:///home/conda/feedstock_root/build_artifacts/docstring-to-markdown_1708563025188/work
docstring_parser==0.16
entrypoints @ file:///home/conda/feedstock_root/build_artifacts/entrypoints_1643888246732/work
evaluate @ file:///home/conda/feedstock_root/build_artifacts/evaluate_1697442180934/work
exceptiongroup @ file:///home/conda/feedstock_root/build_artifacts/exceptiongroup_1704921103267/work
executing @ file:///home/conda/feedstock_root/build_artifacts/executing_1698579936712/work
faiss==1.7.4
fastai @ file:///home/conda/feedstock_root/build_artifacts/fastai_1714277957289/work
fastapi @ file:///home/conda/feedstock_root/build_artifacts/fastapi_1714446803450/work
fastcore @ file:///home/conda/feedstock_root/build_artifacts/fastcore_1719152646370/work
fastdownload @ file:///home/conda/feedstock_root/build_artifacts/fastdownload_1675782504990/work
fastjsonschema @ file:///home/conda/feedstock_root/build_artifacts/python-fastjsonschema_1718477020893/work/dist
fastprogress @ file:///home/conda/feedstock_root/build_artifacts/fastprogress_1658690818839/work
filelock @ file:///home/conda/feedstock_root/build_artifacts/filelock_1719088281970/work
fire==0.6.0
flake8 @ file:///home/conda/feedstock_root/build_artifacts/flake8_1704483779980/work
Flask @ file:///home/conda/feedstock_root/build_artifacts/flask_1712667726126/work
flatbuffers @ file:///home/conda/feedstock_root/build_artifacts/python-flatbuffers_1711466727397/work
fonttools @ file:///home/conda/feedstock_root/build_artifacts/fonttools_1717209196293/work
fqdn @ file:///home/conda/feedstock_root/build_artifacts/fqdn_1638810296540/work/dist
frozenlist @ file:///home/conda/feedstock_root/build_artifacts/frozenlist_1702645481127/work
fsspec @ file:///home/conda/feedstock_root/build_artifacts/fsspec_1686342280219/work
future @ file:///home/conda/feedstock_root/build_artifacts/future_1708610096684/work
gast @ file:///home/conda/feedstock_root/build_artifacts/gast_1688368721366/work
gdown @ file:///home/conda/feedstock_root/build_artifacts/gdown_1715510831822/work
gitdb @ file:///home/conda/feedstock_root/build_artifacts/gitdb_1697791558612/work
GitPython @ file:///home/conda/feedstock_root/build_artifacts/gitpython_1711991025291/work
gluonts @ file:///home/conda/feedstock_root/build_artifacts/gluonts_1697634602503/work
gmpy2 @ file:///home/conda/feedstock_root/build_artifacts/gmpy2_1715527283764/work
google-auth @ file:///home/conda/feedstock_root/build_artifacts/google-auth_1717749192251/work
google-auth-oauthlib @ file:///home/conda/feedstock_root/build_artifacts/google-auth-oauthlib_1688235217226/work
google-pasta==0.2.0
graphviz @ file:///home/conda/feedstock_root/build_artifacts/python-graphviz_1711016462626/work
greenlet @ file:///home/conda/feedstock_root/build_artifacts/greenlet_1703201576006/work
grpcio @ file:///home/conda/feedstock_root/build_artifacts/grpc-split_1690942284331/work
gssapi @ file:///home/conda/feedstock_root/build_artifacts/python-gssapi_1697143962561/work
h11 @ file:///home/conda/feedstock_root/build_artifacts/h11_1664132893548/work
h2 @ file:///home/conda/feedstock_root/build_artifacts/h2_1634280454336/work
h5py @ file:///home/conda/feedstock_root/build_artifacts/h5py_1717664841189/work
hdijupyterutils @ file:///home/conda/feedstock_root/build_artifacts/hdijupyterutils_1694633580855/work
hpack==4.0.0
httpcore @ file:///home/conda/feedstock_root/build_artifacts/httpcore_1711596990900/work
httpx @ file:///home/conda/feedstock_root/build_artifacts/httpx_1708530890843/work
huggingface-hub==0.24.0
hyperframe @ file:///home/conda/feedstock_root/build_artifacts/hyperframe_1619110129307/work
idna @ file:///home/conda/feedstock_root/build_artifacts/idna_1713279365350/work
imagecodecs-lite @ file:///home/conda/feedstock_root/build_artifacts/imagecodecs-lite_1716011160511/work
imageio @ file:///home/conda/feedstock_root/build_artifacts/imageio_1719234999090/work
importlib-metadata @ file:///home/conda/feedstock_root/build_artifacts/importlib-metadata_1701625711742/work
importlib_resources @ file:///home/conda/feedstock_root/build_artifacts/importlib_resources_1711040877059/work
ipykernel @ file:///home/conda/feedstock_root/build_artifacts/ipykernel_1717717528849/work
ipython @ file:///home/conda/feedstock_root/build_artifacts/ipython_1717182742060/work
ipywidgets @ file:///home/conda/feedstock_root/build_artifacts/ipywidgets_1716897651763/work
isoduration @ file:///home/conda/feedstock_root/build_artifacts/isoduration_1638811571363/work/dist
isort @ file:///home/conda/feedstock_root/build_artifacts/isort_1702518492027/work
itsdangerous @ file:///home/conda/feedstock_root/build_artifacts/itsdangerous_1713372668944/work
jedi @ file:///home/conda/feedstock_root/build_artifacts/jedi_1696326070614/work
Jinja2 @ file:///home/conda/feedstock_root/build_artifacts/jinja2_1715127149914/work
jmespath @ file:///home/conda/feedstock_root/build_artifacts/jmespath_1655568249366/work
joblib @ file:///home/conda/feedstock_root/build_artifacts/joblib_1714665484399/work
json5 @ file:///home/conda/feedstock_root/build_artifacts/json5_1712986206667/work
jsonpatch @ file:///home/conda/feedstock_root/build_artifacts/jsonpatch_1695536281965/work
jsonpath-ng @ file:///home/conda/feedstock_root/build_artifacts/jsonpath-ng_1705008192957/work
jsonpointer @ file:///home/conda/feedstock_root/build_artifacts/jsonpointer_1718283388110/work
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
jupyter @ file:///home/conda/feedstock_root/build_artifacts/jupyter_1696255489086/work
jupyter-console @ file:///home/conda/feedstock_root/build_artifacts/jupyter_console_1678118109161/work
jupyter-dash @ file:///home/conda/feedstock_root/build_artifacts/jupyter-dash_1648919001274/work
jupyter-events @ file:///home/conda/feedstock_root/build_artifacts/jupyter_events_1690301630599/work
jupyter-lsp @ file:///home/conda/feedstock_root/build_artifacts/jupyter-lsp-meta_1712707420468/work/jupyter-lsp
jupyter-server-mathjax @ file:///home/conda/feedstock_root/build_artifacts/jupyter-server-mathjax_1672324512570/work
jupyter-ydoc @ file:///home/conda/feedstock_root/build_artifacts/jupyter_ydoc_1695982286089/work/dist
jupyter_ai @ file:///home/conda/feedstock_root/build_artifacts/jupyter-ai_1718999803192/work
jupyter_ai_magics @ file:///home/conda/feedstock_root/build_artifacts/jupyter-ai-magics_1718990665983/work
jupyter_client @ file:///home/conda/feedstock_root/build_artifacts/jupyter_client_1716472197302/work
jupyter_collaboration @ file:///home/conda/feedstock_root/build_artifacts/jupyter-collaboration_1691763294057/work
jupyter_core @ file:///home/conda/feedstock_root/build_artifacts/jupyter_core_1710257277185/work
jupyter_scheduler @ file:///home/conda/feedstock_root/build_artifacts/jupyter_scheduler_1717458331320/work
jupyter_server @ file:///home/conda/feedstock_root/build_artifacts/jupyter_server_1699289262408/work
jupyter_server_fileid @ file:///home/conda/feedstock_root/build_artifacts/jupyter_server_fileid_1714390608391/work
jupyter_server_proxy @ file:///home/conda/feedstock_root/build_artifacts/jupyter-server-proxy_1718130470839/work
jupyter_server_terminals @ file:///home/conda/feedstock_root/build_artifacts/jupyter_server_terminals_1710262634903/work
jupyterlab @ file:///home/conda/feedstock_root/build_artifacts/jupyterlab_1712586972478/work
jupyterlab-lsp @ file:///home/conda/feedstock_root/build_artifacts/jupyter-lsp-meta_1707098417706/work/jupyterlab-lsp
jupyterlab_git @ file:///home/conda/feedstock_root/build_artifacts/jupyterlab-git_1707314297225/work
jupyterlab_nvdashboard==0.11.0
jupyterlab_pygments @ file:///home/conda/feedstock_root/build_artifacts/jupyterlab_pygments_1707149102966/work
jupyterlab_server @ file:///home/conda/feedstock_root/build_artifacts/jupyterlab_server_1690205927615/work
jupyterlab_widgets @ file:///home/conda/feedstock_root/build_artifacts/jupyterlab_widgets_1716891641122/work
keras @ file:///home/conda/feedstock_root/build_artifacts/keras_1698427100715/work/keras-2.14.0-py3-none-any.whl#sha256=d7429d1d2131cc7eb1f2ea2ec330227c7d9d38dab3dfdf2e78defee4ecc43fcd
kiwisolver @ file:///home/conda/feedstock_root/build_artifacts/kiwisolver_1695379902431/work
krb5 @ file:///home/conda/feedstock_root/build_artifacts/pykrb5_1708557570437/work
langchain @ file:///home/conda/feedstock_root/build_artifacts/langchain_1708677218603/work
langchain-aws @ file:///home/conda/feedstock_root/build_artifacts/langchain-aws_1716564756966/work
langchain-community @ file:///home/conda/feedstock_root/build_artifacts/langchain-community_1715223770788/work
langchain-core @ file:///home/conda/feedstock_root/build_artifacts/langchain-core_1715060411785/work
langcodes @ file:///home/conda/feedstock_root/build_artifacts/langcodes_1714235526219/work
langsmith @ file:///home/conda/feedstock_root/build_artifacts/langsmith_1719286093638/work
language_data @ file:///home/conda/feedstock_root/build_artifacts/language-data_1714193818885/work
libmambapy @ file:///home/conda/feedstock_root/build_artifacts/mamba-split_1711394305528/work/libmambapy
lightgbm @ file:///home/conda/feedstock_root/build_artifacts/lightgbm_1674563383654/work
lightning-utilities @ file:///home/conda/feedstock_root/build_artifacts/lightning-utilities_1711597355069/work
llvmlite==0.43.0
locket @ file:///home/conda/feedstock_root/build_artifacts/locket_1650660393415/work
lxml @ file:///home/conda/feedstock_root/build_artifacts/lxml_1704590401168/work
marisa-trie @ file:///home/conda/feedstock_root/build_artifacts/marisa-trie_1706566476731/work
Markdown @ file:///home/conda/feedstock_root/build_artifacts/markdown_1710435156458/work
markdown-it-py @ file:///home/conda/feedstock_root/build_artifacts/markdown-it-py_1686175045316/work
MarkupSafe @ file:///home/conda/feedstock_root/build_artifacts/markupsafe_1706899921127/work
marshmallow @ file:///home/conda/feedstock_root/build_artifacts/marshmallow_1717665261177/work
matplotlib @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-suite_1715976200404/work
matplotlib-inline @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-inline_1713250518406/work
mccabe @ file:///home/conda/feedstock_root/build_artifacts/mccabe_1643049622439/work
mdurl @ file:///home/conda/feedstock_root/build_artifacts/mdurl_1704317613764/work
menuinst @ file:///home/conda/feedstock_root/build_artifacts/menuinst_1718088294643/work
mistral_common==1.3.1
mistral_inference==1.3.0
mistune @ file:///home/conda/feedstock_root/build_artifacts/mistune_1698947099619/work
ml-dtypes @ file:///home/conda/feedstock_root/build_artifacts/ml_dtypes_1695280938812/work
mlforecast @ file:///home/conda/feedstock_root/build_artifacts/mlforecast_1684814678333/work
mock @ file:///home/conda/feedstock_root/build_artifacts/mock_1689092066756/work
model-index @ file:///home/conda/feedstock_root/build_artifacts/model-index_1674171417239/work
mpmath @ file:///home/conda/feedstock_root/build_artifacts/mpmath_1678228039184/work
msgpack @ file:///home/conda/feedstock_root/build_artifacts/msgpack-python_1715670632672/work
multidict @ file:///home/conda/feedstock_root/build_artifacts/multidict_1707040698785/work
multiprocess @ file:///home/conda/feedstock_root/build_artifacts/multiprocess_1706514640841/work
munkres==1.1.4
murmurhash @ file:///home/conda/feedstock_root/build_artifacts/murmurhash_1695449783955/work
mypy-extensions @ file:///home/conda/feedstock_root/build_artifacts/mypy_extensions_1675543315189/work
nbclient @ file:///home/conda/feedstock_root/build_artifacts/nbclient_1710317608672/work
nbconvert @ file:///home/conda/feedstock_root/build_artifacts/nbconvert-meta_1718135430380/work
nbdime @ file:///home/conda/feedstock_root/build_artifacts/nbdime_1700575643650/work
nbformat @ file:///home/conda/feedstock_root/build_artifacts/nbformat_1712238998817/work
nest_asyncio @ file:///home/conda/feedstock_root/build_artifacts/nest-asyncio_1705850609492/work
networkx @ file:///home/conda/feedstock_root/build_artifacts/networkx_1712540363324/work
nlpaug @ file:///home/conda/feedstock_root/build_artifacts/nlpaug_1675088251829/work
nltk @ file:///home/conda/feedstock_root/build_artifacts/nltk_1672696305909/work
nose @ file:///home/conda/feedstock_root/build_artifacts/nose_1602434998960/work
notebook @ file:///home/conda/feedstock_root/build_artifacts/notebook_1713397707292/work
notebook_shim @ file:///home/conda/feedstock_root/build_artifacts/notebook-shim_1707957777232/work
nptyping @ file:///home/conda/feedstock_root/build_artifacts/nptyping_1668652041379/work
numba @ file:///home/conda/feedstock_root/build_artifacts/numba_1718888013454/work
numpy @ file:///home/conda/feedstock_root/build_artifacts/numpy_1707225380409/work/dist/numpy-1.26.4-cp310-cp310-linux_x86_64.whl#sha256=51131fd8fc130cd168aecaf1bc0ea85f92e8ffebf211772ceb16ac2e7f10d7ca
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.5.82
nvidia-nvtx-cu12==12.1.105
oauthlib @ file:///home/conda/feedstock_root/build_artifacts/oauthlib_1666056362788/work
omegaconf @ file:///home/conda/feedstock_root/build_artifacts/omegaconf_1669157093953/work
openmim @ file:///home/conda/feedstock_root/build_artifacts/openmim_1679406745319/work
opt-einsum @ file:///home/conda/feedstock_root/build_artifacts/opt_einsum_1696448916724/work
ordered-set @ file:///home/conda/feedstock_root/build_artifacts/ordered-set_1643221357603/work
orjson @ file:///home/conda/feedstock_root/build_artifacts/orjson_1718073284999/work/target/wheels/orjson-3.10.4-cp310-cp310-linux_x86_64.whl#sha256=378a4c2ffd007c04573a2e6ae4e5718ef07b12724ff7309ca3b2ef0d27edbf06
overrides @ file:///home/conda/feedstock_root/build_artifacts/overrides_1706394519472/work
packaging @ file:///home/conda/feedstock_root/build_artifacts/packaging_1696202382185/work
pandas @ file:///home/conda/feedstock_root/build_artifacts/pandas_1702057131119/work
pandocfilters @ file:///home/conda/feedstock_root/build_artifacts/pandocfilters_1631603243851/work
papermill @ file:///home/conda/feedstock_root/build_artifacts/papermill_1714214592023/work
paramiko @ file:///home/conda/feedstock_root/build_artifacts/paramiko_1703015906107/work
parso @ file:///home/conda/feedstock_root/build_artifacts/parso_1712320355065/work
partd @ file:///home/conda/feedstock_root/build_artifacts/partd_1715026491486/work
pathos @ file:///home/conda/feedstock_root/build_artifacts/pathos_1706533117008/work
patsy @ file:///home/conda/feedstock_root/build_artifacts/patsy_1704469236901/work
pexpect @ file:///home/conda/feedstock_root/build_artifacts/pexpect_1706113125309/work
pickleshare @ file:///home/conda/feedstock_root/build_artifacts/pickleshare_1602536217715/work
pillow @ file:///home/conda/feedstock_root/build_artifacts/pillow_1718833719602/work
pkgutil_resolve_name @ file:///home/conda/feedstock_root/build_artifacts/pkgutil-resolve-name_1694617248815/work
platformdirs @ file:///home/conda/feedstock_root/build_artifacts/platformdirs_1715777629804/work
plotly @ file:///home/conda/feedstock_root/build_artifacts/plotly_1714829923649/work
pluggy @ file:///home/conda/feedstock_root/build_artifacts/pluggy_1713667077545/work
ply @ file:///home/conda/feedstock_root/build_artifacts/ply_1712242996588/work
pox @ file:///home/conda/feedstock_root/build_artifacts/pox_1706431181924/work
ppft @ file:///home/conda/feedstock_root/build_artifacts/ppft_1706409481851/work
preshed @ file:///home/conda/feedstock_root/build_artifacts/preshed_1695644760607/work
prometheus_client @ file:///home/conda/feedstock_root/build_artifacts/prometheus_client_1707932675456/work
prompt_toolkit @ file:///home/conda/feedstock_root/build_artifacts/prompt-toolkit_1718047967974/work
protobuf==4.21.12
psutil @ file:///home/conda/feedstock_root/build_artifacts/psutil_1705722392846/work
ptyprocess @ file:///home/conda/feedstock_root/build_artifacts/ptyprocess_1609419310487/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl
pure-eval @ file:///home/conda/feedstock_root/build_artifacts/pure_eval_1642875951954/work
pure-sasl @ file:///home/conda/feedstock_root/build_artifacts/pure-sasl_1631890804823/work
pyarrow==12.0.1
pyarrow-hotfix @ file:///home/conda/feedstock_root/build_artifacts/pyarrow-hotfix_1700596371886/work
pyasn1 @ file:///home/conda/feedstock_root/build_artifacts/pyasn1_1713209357222/work
pyasn1_modules @ file:///home/conda/feedstock_root/build_artifacts/pyasn1-modules_1713209683338/work
PyAthena @ file:///home/conda/feedstock_root/build_artifacts/pyathena_1716887874522/work
pycodestyle @ file:///home/conda/feedstock_root/build_artifacts/pycodestyle_1697202867721/work
pycosat @ file:///home/conda/feedstock_root/build_artifacts/pycosat_1696355758174/work
pycparser @ file:///home/conda/feedstock_root/build_artifacts/pycparser_1711811537435/work
pydantic==2.6.1
pydantic_core==2.16.2
pydocstyle @ file:///home/conda/feedstock_root/build_artifacts/pydocstyle_1673997487070/work
pyflakes @ file:///home/conda/feedstock_root/build_artifacts/pyflakes_1704424584912/work
Pygments @ file:///home/conda/feedstock_root/build_artifacts/pygments_1714846767233/work
PyHive @ file:///home/conda/feedstock_root/build_artifacts/pyhive_1692318104998/work
PyJWT @ file:///home/conda/feedstock_root/build_artifacts/pyjwt_1706895065046/work
pylint @ file:///home/conda/feedstock_root/build_artifacts/pylint_1717705591781/work
PyNaCl @ file:///home/conda/feedstock_root/build_artifacts/pynacl_1695544850803/work
pynvml==11.5.3
pyOpenSSL @ file:///home/conda/feedstock_root/build_artifacts/pyopenssl_1706660063483/work
pyparsing @ file:///home/conda/feedstock_root/build_artifacts/pyparsing_1709721012883/work
PyQt5==5.15.9
PyQt5-sip==12.12.2
pyrsistent @ file:///home/conda/feedstock_root/build_artifacts/pyrsistent_1698753827123/work
PySocks @ file:///home/conda/feedstock_root/build_artifacts/pysocks_1661604839144/work
pyspnego @ file:///home/conda/feedstock_root/build_artifacts/pyspnego_1718198288313/work
pytesseract @ file:///home/conda/feedstock_root/build_artifacts/pytesseract_1647306555263/work
python-dateutil @ file:///home/conda/feedstock_root/build_artifacts/python-dateutil_1709299778482/work
python-json-logger @ file:///home/conda/feedstock_root/build_artifacts/python-json-logger_1677079630776/work
python-lsp-jsonrpc @ file:///home/conda/feedstock_root/build_artifacts/python-lsp-jsonrpc_1695528365348/work
python-lsp-server @ file:///home/conda/feedstock_root/build_artifacts/python-lsp-server-meta_1711734797703/work
python-slugify @ file:///home/conda/feedstock_root/build_artifacts/python-slugify-split_1707425621764/work
pytoolconfig @ file:///home/conda/feedstock_root/build_artifacts/pytoolconfig_1675124745143/work
pytorch-lightning @ file:///home/conda/feedstock_root/build_artifacts/pytorch-lightning_1694753701789/work
pytorch-metric-learning @ file:///home/conda/feedstock_root/build_artifacts/pytorch-metric-learning_1674962728780/work
pytz @ file:///home/conda/feedstock_root/build_artifacts/pytz_1680088766131/work
pyu2f @ file:///home/conda/feedstock_root/build_artifacts/pyu2f_1604248910016/work
PyWavelets @ file:///home/conda/feedstock_root/build_artifacts/pywavelets_1695567558330/work
PyYAML @ file:///home/conda/feedstock_root/build_artifacts/pyyaml_1695373428874/work
pyzmq @ file:///home/conda/feedstock_root/build_artifacts/pyzmq_1715024398995/work
qtconsole @ file:///home/conda/feedstock_root/build_artifacts/qtconsole-base_1714942934316/work
QtPy @ file:///home/conda/feedstock_root/build_artifacts/qtpy_1698112029416/work
redshift_connector @ file:///home/conda/feedstock_root/build_artifacts/redshift_connector_1718898916124/work
referencing==0.35.1
regex @ file:///home/conda/feedstock_root/build_artifacts/regex_1715828395057/work
requests @ file:///home/conda/feedstock_root/build_artifacts/requests_1717057054362/work
requests-kerberos @ file:///home/conda/feedstock_root/build_artifacts/requests-kerberos_1708339520234/work
requests-oauthlib @ file:///home/conda/feedstock_root/build_artifacts/requests-oauthlib_1711290127547/work
responses @ file:///home/conda/feedstock_root/build_artifacts/responses_1643839609465/work
retrying==1.3.3
rfc3339-validator @ file:///home/conda/feedstock_root/build_artifacts/rfc3339-validator_1638811747357/work
rfc3986-validator @ file:///home/conda/feedstock_root/build_artifacts/rfc3986-validator_1598024191506/work
rich @ file:///home/conda/feedstock_root/build_artifacts/rich-split_1709150387247/work/dist
rope @ file:///home/conda/feedstock_root/build_artifacts/rope_1711296293824/work
rpds-py==0.19.0
rsa @ file:///home/conda/feedstock_root/build_artifacts/rsa_1658328885051/work
ruamel.yaml @ file:///home/conda/feedstock_root/build_artifacts/ruamel.yaml_1707298115475/work
ruamel.yaml.clib @ file:///home/conda/feedstock_root/build_artifacts/ruamel.yaml.clib_1707314473442/work
s3transfer @ file:///home/conda/feedstock_root/build_artifacts/s3transfer_1719300139436/work
safetensors @ file:///home/conda/feedstock_root/build_artifacts/safetensors_1713253444440/work
sagemaker @ file:///home/conda/feedstock_root/build_artifacts/sagemaker-python-sdk_1718975582923/work
sagemaker-headless-execution-driver @ file:///home/conda/feedstock_root/build_artifacts/sagemaker-headless-execution-driver_1701485690453/work
sagemaker-jupyterlab-emr-extension @ file:///home/conda/feedstock_root/build_artifacts/sagemaker-jupyterlab-emr-extension_1699459875830/work
sagemaker-jupyterlab-extension @ file:///home/conda/feedstock_root/build_artifacts/sagemaker-jupyterlab-extension_1713553599180/work
sagemaker-jupyterlab-extension-common @ file:///home/conda/feedstock_root/build_artifacts/sagemaker-jupyterlab-extension-common_1718653645629/work
sagemaker-kernel-wrapper @ file:///home/conda/feedstock_root/build_artifacts/sagemaker-kernel-wrapper_1697451623569/work
sagemaker-studio-analytics-extension @ file:///home/conda/feedstock_root/build_artifacts/sagemaker-studio-analytics-extension_1697836878634/work
sagemaker-studio-sparkmagic-lib @ file:///home/conda/feedstock_root/build_artifacts/sagemaker-studio-sparkmagic-lib_1695149984387/work
sasl==0.3.1
schema @ file:///home/conda/feedstock_root/build_artifacts/schema_1714829163029/work
scikit-image @ file:///home/conda/feedstock_root/build_artifacts/scikit-image_1667117143644/work
scikit-learn @ file:///home/conda/feedstock_root/build_artifacts/scikit-learn_1715869161825/work
SciPy @ file:///home/conda/feedstock_root/build_artifacts/scipy-split_1700812469549/work/dist/scipy-1.11.4-cp310-cp310-linux_x86_64.whl#sha256=136e231ccb8768e60c17ed60f2c2423262d3dfd8136f373e715db9dd77617e41
scramp @ file:///home/conda/feedstock_root/build_artifacts/scramp_1667411948349/work
Send2Trash @ file:///home/conda/feedstock_root/build_artifacts/send2trash_1712584999685/work
sentencepiece==0.2.0
seqeval @ file:///home/conda/feedstock_root/build_artifacts/seqeval_1607636513878/work
shellingham @ file:///home/conda/feedstock_root/build_artifacts/shellingham_1698144360966/work
simpervisor @ file:///home/conda/feedstock_root/build_artifacts/simpervisor_1684441099342/work
simple_parsing==0.1.5
sip @ file:///home/conda/feedstock_root/build_artifacts/sip_1697300428978/work
six @ file:///home/conda/feedstock_root/build_artifacts/six_1620240208055/work
smart-open @ file:///home/conda/feedstock_root/build_artifacts/smart_open_split_1694066705667/work/dist
smdebug-rulesconfig @ file:///home/conda/feedstock_root/build_artifacts/smdebug-rulesconfig_1619805148199/work
smmap @ file:///home/conda/feedstock_root/build_artifacts/smmap_1634310307496/work
sniffio @ file:///home/conda/feedstock_root/build_artifacts/sniffio_1708952932303/work
snowballstemmer @ file:///home/conda/feedstock_root/build_artifacts/snowballstemmer_1637143057757/work
snowflake-connector-python @ file:///home/conda/feedstock_root/build_artifacts/snowflake-connector-python_1719328315873/work
sortedcontainers @ file:///home/conda/feedstock_root/build_artifacts/sortedcontainers_1621217038088/work
soupsieve @ file:///home/conda/feedstock_root/build_artifacts/soupsieve_1693929250441/work
spacy @ file:///home/conda/feedstock_root/build_artifacts/spacy_1717591992814/work
spacy-legacy @ file:///home/conda/feedstock_root/build_artifacts/spacy-legacy_1674550301837/work
spacy-loggers @ file:///home/conda/feedstock_root/build_artifacts/spacy-loggers_1694527114282/work
sparkmagic @ file:///home/conda/feedstock_root/build_artifacts/sparkmagic_1694633601704/work/sparkmagic
SQLAlchemy==2.0.30
sqlparse @ file:///home/conda/feedstock_root/build_artifacts/sqlparse_1715013953229/work
srsly @ file:///home/conda/feedstock_root/build_artifacts/srsly_1695653949688/work
stack-data @ file:///home/conda/feedstock_root/build_artifacts/stack_data_1669632077133/work
starlette @ file:///home/conda/feedstock_root/build_artifacts/starlette-recipe_1709667058396/work
statsforecast @ file:///home/conda/feedstock_root/build_artifacts/statsforecast_1669882640898/work
statsmodels @ file:///home/conda/feedstock_root/build_artifacts/statsmodels_1715941214543/work
supervisor==4.2.5
sympy @ file:///home/conda/feedstock_root/build_artifacts/sympy_1718625539893/work
tabulate @ file:///home/conda/feedstock_root/build_artifacts/tabulate_1665138452165/work
tblib @ file:///home/conda/feedstock_root/build_artifacts/tblib_1694702375735/work
tenacity @ file:///home/conda/feedstock_root/build_artifacts/tenacity_1719315008981/work
tensorboard @ file:///home/conda/feedstock_root/build_artifacts/tensorboard_1695917943728/work/tensorboard-2.14.1-py3-none-any.whl#sha256=3db108fb58f023b6439880e177743c5f1e703e9eeb5fb7d597871f949f85fd58
tensorboard-data-server @ file:///home/conda/feedstock_root/build_artifacts/tensorboard-data-server_1695425366946/work/tensorboard_data_server-0.7.0-py3-none-manylinux2014_x86_64.whl#sha256=aa1f69b2111bb4309cc6277ac277c89a9f67d074aa666b96eebe7401a359e1d5
tensorflow @ file:///home/conda/feedstock_root/build_artifacts/tensorflow-split_1699620173861/work/tensorflow_pkg/tensorflow-2.14.0-cp310-cp310-linux_x86_64.whl#sha256=a015dab8043172172ca4a754467c15209dcc02b420221cfcbc32ffa7e0c47fbd
tensorflow-estimator @ file:///home/conda/feedstock_root/build_artifacts/tensorflow-split_1699620173861/work/tensorflow-estimator/wheel_dir/tensorflow_estimator-2.14.0-py2.py3-none-any.whl#sha256=f05e70bcc48452fdf9e473666558a4d4cf92833fa4f1411e17806326b41e6fb1
termcolor @ file:///home/conda/feedstock_root/build_artifacts/termcolor_1704357939450/work
terminado @ file:///home/conda/feedstock_root/build_artifacts/terminado_1710262609923/work
text-unidecode @ file:///home/conda/feedstock_root/build_artifacts/text-unidecode_1694707102786/work
thinc @ file:///home/conda/feedstock_root/build_artifacts/thinc_1715461684054/work
threadpoolctl @ file:///home/conda/feedstock_root/build_artifacts/threadpoolctl_1714400101435/work
thrift @ file:///home/conda/feedstock_root/build_artifacts/thrift_1711156094832/work/lib/py
thrift-sasl @ file:///home/conda/feedstock_root/build_artifacts/thrift_sasl_1664049052220/work
tifffile @ file:///home/conda/feedstock_root/build_artifacts/tifffile_1591280222285/work
tiktoken==0.7.0
timm @ file:///home/conda/feedstock_root/build_artifacts/timm_1708532402622/work
tinycss2 @ file:///home/conda/feedstock_root/build_artifacts/tinycss2_1713974937325/work
tokenizers @ file:///home/conda/feedstock_root/build_artifacts/tokenizers_1713402723682/work/bindings/python
toml @ file:///home/conda/feedstock_root/build_artifacts/toml_1604308577558/work
tomli @ file:///home/conda/feedstock_root/build_artifacts/tomli_1644342247877/work
tomlkit @ file:///home/conda/feedstock_root/build_artifacts/tomlkit_1715185399719/work
toolz @ file:///home/conda/feedstock_root/build_artifacts/toolz_1706112571092/work
torch==2.3.1
torchmetrics @ file:///home/conda/feedstock_root/build_artifacts/torchmetrics_1691516815780/work
torchvision @ file:///home/conda/feedstock_root/build_artifacts/torchvision-split_1699863652134/work
tornado @ file:///home/conda/feedstock_root/build_artifacts/tornado_1717722796999/work
tqdm @ file:///home/conda/feedstock_root/build_artifacts/tqdm_1714854870413/work
traitlets @ file:///home/conda/feedstock_root/build_artifacts/traitlets_1713535121073/work
transformers @ git+https://github.com/huggingface/transformers.git@22f888b3fab3d914882b8f44896a5658712f535c
triton==2.3.1
truststore @ file:///home/conda/feedstock_root/build_artifacts/truststore_1694154605758/work
typer @ file:///home/conda/feedstock_root/build_artifacts/typer_1711217621866/work
types-python-dateutil @ file:///home/conda/feedstock_root/build_artifacts/types-python-dateutil_1710589910274/work
typing-inspect @ file:///home/conda/feedstock_root/build_artifacts/typing_inspect_1685820062773/work
typing-utils @ file:///home/conda/feedstock_root/build_artifacts/typing_utils_1622899189314/work
typing_extensions @ file:///home/conda/feedstock_root/build_artifacts/typing_extensions_1717802530399/work
typish @ file:///home/conda/feedstock_root/build_artifacts/typish_1628254486386/work
tzdata @ file:///home/conda/feedstock_root/build_artifacts/python-tzdata_1707747584337/work
ujson @ file:///home/conda/feedstock_root/build_artifacts/ujson_1715783105793/work
unicodedata2 @ file:///home/conda/feedstock_root/build_artifacts/unicodedata2_1695847980273/work
uri-template @ file:///home/conda/feedstock_root/build_artifacts/uri-template_1688655812972/work/dist
urllib3 @ file:///home/conda/feedstock_root/build_artifacts/urllib3_1718728347128/work
uvicorn @ file:///home/conda/feedstock_root/build_artifacts/uvicorn-split_1717404975401/work
wasabi @ file:///home/conda/feedstock_root/build_artifacts/wasabi_1715409644734/work
wcwidth @ file:///home/conda/feedstock_root/build_artifacts/wcwidth_1704731205417/work
weasel @ file:///home/conda/feedstock_root/build_artifacts/weasel_1699295455892/work
webcolors @ file:///home/conda/feedstock_root/build_artifacts/webcolors_1717667289718/work
webencodings @ file:///home/conda/feedstock_root/build_artifacts/webencodings_1694681268211/work
websocket-client @ file:///home/conda/feedstock_root/build_artifacts/websocket-client_1713923384721/work
Werkzeug @ file:///home/conda/feedstock_root/build_artifacts/werkzeug_1715000201436/work
whatthepatch @ file:///home/conda/feedstock_root/build_artifacts/whatthepatch_1683396758362/work
widgetsnbextension @ file:///home/conda/feedstock_root/build_artifacts/widgetsnbextension_1716891659446/work
window_ops @ file:///home/conda/feedstock_root/build_artifacts/window-ops_1709587127407/work
wrapt @ file:///home/conda/feedstock_root/build_artifacts/wrapt_1666806031361/work
xformers==0.0.27
xgboost==1.7.6
xxhash @ file:///home/conda/feedstock_root/build_artifacts/python-xxhash_1696486308932/work
y-py @ file:///home/conda/feedstock_root/build_artifacts/y-py_1696495053386/work
yapf @ file:///home/conda/feedstock_root/build_artifacts/yapf_1690387939953/work
yarl @ file:///home/conda/feedstock_root/build_artifacts/yarl_1705508292061/work
ypy-websocket @ file:///home/conda/feedstock_root/build_artifacts/ypy-websocket_1696470545029/work
zict @ file:///home/conda/feedstock_root/build_artifacts/zict_1681770155528/work
zipp @ file:///home/conda/feedstock_root/build_artifacts/zipp_1718013267051/work
zstandard==0.22.0

Reproduction Steps

Running the following code with a model such as Nemo Instruct (which can not be stored onto a single GPU):

from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest

tokenizer = MistralTokenizer.from_file(f"{model_folder_name}/tekken.json")
model = Transformer.from_folder(model_folder_name)

lead to teh following error: "OutOfMemoryError: CUDA out of memory. Tried to allocate 140.00 MiB. GPU"

This is because, as you can see in the attached screenshot, it is loaded onto one single GPU

image

I looked for a parameter inside the Transformer python module but I don't see nothing to enable the multi-gpu inference.

Thank you so much

Expected Behavior

I would expect to see the model loaded onto multiple GPUs automatically as in the screenshot

image

Additional Context

No response

Suggested Solutions

I would recommend to add a parameter such as the device_map parameter of Hugging Face Transformer library: https://huggingface.co/docs/transformers/main_classes/pipelines. Or distribute the model automatically

phenylalanin91 commented 3 months ago

same here with x2 A100 80GB: image

model.eval()
completion_request = ChatCompletionRequest(messages=[UserMessage(content=prompt)])

tokens = tokenizer.encode_chat_completion(completion_request).tokens

with torch.no_grad():
    out_tokens, _ = generate([tokens], model, max_tokens=1024*2, temperature=0.35, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
answer = tokenizer.decode(out_tokens[0])

where 'prompt' contains roughly 50k tokens, resulting in: image

Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] on linux

accelerate==0.32.1
adal==1.2.7
aiobotocore==2.13.1
aiohttp==3.9.5
aioitertools==0.11.0
aiosignal==1.3.1
annotated-types==0.7.0
argcomplete==3.4.0
asttokens @ file:///home/conda/feedstock_root/build_artifacts/asttokens_1698341106958/work
attrs==23.2.0
azure-ai-ml==1.18.0
azure-common==1.1.28
azure-core==1.30.2
azure-graphrbac==0.61.1
azure-identity==1.17.1
azure-mgmt-authorization==4.0.0
azure-mgmt-containerregistry==10.3.0
azure-mgmt-core==1.4.0
azure-mgmt-keyvault==10.3.1
azure-mgmt-network==25.4.0
azure-mgmt-resource==23.1.1
azure-mgmt-storage==21.2.1
azure-storage-blob==12.21.0
azure-storage-file-datalake==12.16.0
azure-storage-file-share==12.17.0
azureml-core==1.56.0
azureml-dataprep==5.1.6
azureml-dataprep-native==41.0.0
azureml-dataprep-rslex==2.22.2
azureml-dataset-runtime==1.56.0
azureml-defaults==1.56.0.post1
azureml-inference-server-http==1.2.2
backports.tempfile==1.0
backports.weakref==1.0.post1
bcrypt==4.2.0
bitsandbytes==0.43.1
blinker==1.8.2
botocore==1.34.131
cachetools==5.4.0
certifi==2024.7.4
cffi==1.16.0
charset-normalizer==3.3.2
ciso8601==2.3.1
click==8.1.7
cloudpickle==2.2.1
colorama==0.4.6
comm @ file:///home/conda/feedstock_root/build_artifacts/comm_1710320294760/work
contextlib2==21.6.0
contourpy==1.2.1
cryptography==43.0.0
cycler==0.12.1
darwin-rests==7.1.2+moreni
datasets==2.20.0
debugpy @ file:///home/conda/feedstock_root/build_artifacts/debugpy_1719378645730/work
decorator @ file:///home/conda/feedstock_root/build_artifacts/decorator_1641555617451/work
devolab2igam==0.0.101
dill==0.3.8
docker==7.1.0
docstring_parser==0.16
docx2txt==0.8
ecb-certifi==4.3.0+moreni
einops==0.8.0
et-xmlfile==1.1.0
exceptiongroup @ file:///home/conda/feedstock_root/build_artifacts/exceptiongroup_1720869315914/work
executing @ file:///home/conda/feedstock_root/build_artifacts/executing_1698579936712/work
fasteners==0.19
filelock==3.15.4
fire==0.6.0
flash_attn==2.6.1
Flask==2.3.2
Flask-Cors==3.0.10
fonttools==4.53.1
frozenlist==1.4.1
fsspec==2024.6.1
fusepy==3.0.1
google-api-core==2.19.1
google-auth==2.32.0
googleapis-common-protos==1.63.2
gssapi==1.8.3
gunicorn==22.0.0
huggingface-hub==0.24.0
humanfriendly==10.0
idna==3.7
importlib_metadata @ file:///home/conda/feedstock_root/build_artifacts/importlib-metadata_1719361860083/work
inference-schema==1.7.2
ipykernel @ file:///home/conda/feedstock_root/build_artifacts/ipykernel_1719845459717/work
ipython @ file:///home/conda/feedstock_root/build_artifacts/ipython_1719582526268/work
ipywidgets @ file:///home/conda/feedstock_root/build_artifacts/ipywidgets_1716897651763/work
isodate==0.6.1
itsdangerous==2.2.0
jedi @ file:///home/conda/feedstock_root/build_artifacts/jedi_1696326070614/work
jeepney==0.8.0
Jinja2==3.1.4
jmespath==1.0.1
joblib==1.4.2
jsonpickle==3.2.2
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
jupyter_client @ file:///home/conda/feedstock_root/build_artifacts/jupyter_client_1716472197302/work
jupyter_core @ file:///home/conda/feedstock_root/build_artifacts/jupyter_core_1710257359434/work
jupyterlab_widgets @ file:///home/conda/feedstock_root/build_artifacts/jupyterlab_widgets_1716891641122/work
kiwisolver==1.4.5
knack==0.11.0
krb5==0.6.0
lxml==5.2.2
MarkupSafe==2.1.5
marshmallow==3.21.3
matplotlib==3.9.1
matplotlib-inline @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-inline_1713250518406/work
mistral_common==1.3.3
mistral_finetune==0.0.0
mistral_inference==1.3.1
mpmath==1.3.0
msal==1.30.0
msal-extensions==1.2.0
msrest==0.7.1
msrestazure==0.6.4
multidict==6.0.5
multiprocess==0.70.16
ndg-httpsclient==0.5.1
nest_asyncio @ file:///home/conda/feedstock_root/build_artifacts/nest-asyncio_1705850609492/work
networkx==3.3
ninja==1.11.1.1
numpy==1.23.5
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.5.82
nvidia-nvtx-cu12==12.1.105
oauthlib==3.2.2
opencensus==0.11.4
opencensus-context==0.1.3
opencensus-ext-azure==1.1.13
opencensus-ext-logging==0.1.1
openpyxl==3.1.5
packaging @ file:///home/conda/feedstock_root/build_artifacts/packaging_1718189413536/work
pandas==2.1.4
paramiko==3.4.0
parso @ file:///home/conda/feedstock_root/build_artifacts/parso_1712320355065/work
pathspec==0.12.1
patsy==0.5.6
pexpect @ file:///home/conda/feedstock_root/build_artifacts/pexpect_1706113125309/work
pickleshare @ file:///home/conda/feedstock_root/build_artifacts/pickleshare_1602536217715/work
pillow==10.4.0
pkginfo==1.11.1
platformdirs @ file:///home/conda/feedstock_root/build_artifacts/platformdirs_1715777629804/work
plotly==5.22.0
plotly-express==0.4.1
portalocker==2.10.1
prompt_toolkit @ file:///home/conda/feedstock_root/build_artifacts/prompt-toolkit_1718047967974/work
proto-plus==1.24.0
protobuf==5.27.2
psutil @ file:///home/conda/feedstock_root/build_artifacts/psutil_1719274586160/work
ptyprocess @ file:///home/conda/feedstock_root/build_artifacts/ptyprocess_1609419310487/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl
pure_eval @ file:///home/conda/feedstock_root/build_artifacts/pure_eval_1721585709575/work
pyarrow==17.0.0
pyarrow-hotfix==0.6
pyasn1==0.6.0
pyasn1_modules==0.4.0
pycparser==2.22
pydantic==2.8.2
pydantic-settings==2.3.4
pydantic_core==2.20.1
pydash==8.0.3
Pygments @ file:///home/conda/feedstock_root/build_artifacts/pygments_1714846767233/work
PyJWT==2.8.0
PyNaCl==1.5.0
pynvml==11.5.3
pyodbc==5.0.0
pyOpenSSL==24.2.1
pyparsing==3.1.2
PySocks==1.7.1
pyspnego==0.11.0
python-dateutil @ file:///home/conda/feedstock_root/build_artifacts/python-dateutil_1709299778482/work
python-docx==1.1.2
python-dotenv==1.0.1
python-gitlab==4.8.0
python-slugify==8.0.4
pytz==2024.1
PyYAML==6.0.1
pyzmq @ file:///home/conda/feedstock_root/build_artifacts/pyzmq_1715024370414/work
referencing==0.35.1
regex==2024.5.15
requests==2.32.3
requests-kerberos==0.15.0
requests-oauthlib==2.0.0
requests-toolbelt==1.0.0
rpds-py==0.19.0
rsa==4.9
s3fs==2024.6.1
safetensors==0.4.3
scikit-learn==1.5.1
scipy @ file:///home/conda/feedstock_root/build_artifacts/scipy-split_1720323007424/work/dist/scipy-1.14.0-cp311-cp311-linux_x86_64.whl#sha256=1555805d3d22eadcd79d8bbf4de2865c7ad881feceb57d3c2d91ec2469d4acf7
SecretStorage==3.3.3
sentencepiece==0.2.0
simple_parsing==0.1.5
simply-rest==4.0
six @ file:///home/conda/feedstock_root/build_artifacts/six_1620240208055/work
stack-data @ file:///home/conda/feedstock_root/build_artifacts/stack_data_1669632077133/work
statsmodels==0.14.2
strictyaml==1.7.3
sympy==1.13.1
tabulate==0.9.0
tenacity==8.5.0
termcolor==2.4.0
text-unidecode==1.3
threadpoolctl==3.5.0
tiktoken==0.7.0
tokenizers==0.19.1
torch==2.3.1
tornado @ file:///home/conda/feedstock_root/build_artifacts/tornado_1717722848697/work
tqdm==4.66.4
traitlets @ file:///home/conda/feedstock_root/build_artifacts/traitlets_1713535121073/work
transformers==4.42.4
triton==2.3.1
typing_extensions @ file:///home/conda/feedstock_root/build_artifacts/typing_extensions_1717802530399/work
tzdata==2024.1
tzlocal==5.2
urllib3==2.2.2
vl_connect==0.1.41
wcwidth @ file:///home/conda/feedstock_root/build_artifacts/wcwidth_1704731205417/work
Werkzeug==3.0.3
widgetsnbextension @ file:///home/conda/feedstock_root/build_artifacts/widgetsnbextension_1716891659446/work
wrapt==1.16.0
xformers==0.0.27
xxhash==3.4.1
yarl==1.9.4
zipp @ file:///home/conda/feedstock_root/build_artifacts/zipp_1718013267051/work
ShadyPi commented 3 months ago

Same issue on 4 NVIDIA-A10, only 1 A10 is used and the rest GPUs remain empty, but "Out of Memory" error occurs!

abhishekdhankar95 commented 1 month ago

You might want to try the vLLM library. I used that to deploy the Mistral-nemo model in a multi-node, multi-gpu setting. Reference: https://docs.mistral.ai/deployment/self-deployment/vllm/

I could be wrong, but I think vLLM library also has cpu-offload capability for 1 GPU settings.

It's slower than mistral-inference for obvious reasons, but it's better than nothing.

patrickvonplaten commented 3 weeks ago

Hey @Cerrix you need to load the model with pipeline parallelism enabled e.g. see:

https://github.com/mistralai/mistral-inference?tab=readme-ov-file#cli - specifically:

torchrun --nproc-per-node 2 (your script)

Also make sure to define pipeline parallelism as shown here: https://github.com/mistralai/mistral-inference/blob/fffa5dac372280e5810d8008e54f70b1a5c40bde/src/mistral_inference/main.py#L124