triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend
Apache License 2.0
588 stars 81 forks source link

Build via Docker is much big than the image in NGC #394

Open ZJU-lishuang opened 2 months ago

ZJU-lishuang commented 2 months ago

TensorRT-LLM Backend I have built via docker.

But the size of docker image is too big than the image in NGC.

How to decrease the size?

捕获

this is the pip list.

Package                  Version
------------------------ -------------------
absl-py                  2.1.0
accelerate               0.25.0
aiohttp                  3.9.3
aiosignal                1.3.1
async-timeout            4.0.3
attrs                    23.2.0
bandit                   1.7.7
blinker                  1.4
Brotli                   1.1.0
build                    1.2.1
certifi                  2024.2.2
cfgv                     3.4.0
charset-normalizer       3.3.2
click                    8.1.7
cloudpickle              3.0.0
colored                  2.2.4
coloredlogs              15.0.1
coverage                 7.4.4
cryptography             3.4.8
cuda-python              12.4.0
cutlass_library          3.4.1
datasets                 2.18.0
dbus-python              1.2.18
diffusers                0.15.0
dill                     0.3.8
distlib                  0.3.8
distro                   1.7.0
einops                   0.7.0
evaluate                 0.4.1
exceptiongroup           1.2.0
execnet                  2.1.0
filelock                 3.13.3
fire                     0.6.0
flatbuffers              24.3.25
frozenlist               1.4.1
fsspec                   2024.2.0
gevent                   24.2.1
geventhttpclient         2.0.2
graphviz                 0.20.3
greenlet                 3.0.3
grpcio                   1.62.1
h5py                     3.10.0
httplib2                 0.20.2
huggingface-hub          0.22.2
humanfriendly            10.0
identify                 2.5.35
idna                     3.6
importlib-metadata       4.6.4
iniconfig                2.0.0
janus                    1.0.0
jeepney                  0.7.1
Jinja2                   3.1.3
joblib                   1.3.2
keyring                  23.5.0
lark                     1.1.9
launchpadlib             1.10.16
lazr.restfulclient       0.14.4
lazr.uri                 1.0.6
markdown-it-py           3.0.0
MarkupSafe               2.1.5
mdurl                    0.1.2
more-itertools           8.10.0
mpi4py                   3.1.5
mpmath                   1.3.0
multidict                6.0.5
multiprocess             0.70.16
mypy                     1.9.0
mypy-extensions          1.0.0
networkx                 3.3
ninja                    1.11.1.1
nltk                     3.8.1
nodeenv                  1.8.0
numpy                    1.26.3
nvidia-ammo              0.7.4
nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12        8.9.2.26
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu12     12.1.0.106
nvidia-nccl-cu12         2.19.3
nvidia-nvjitlink-cu12    12.4.127
nvidia-nvtx-cu12         12.1.105
oauthlib                 3.2.0
onnx                     1.16.0
onnx-graphsurgeon        0.3.27
onnxruntime              1.16.3
optimum                  1.18.0
packaging                24.0
pandas                   2.2.1
parameterized            0.9.0
pbr                      6.0.0
pillow                   10.3.0
pip                      23.3.2
platformdirs             4.2.0
pluggy                   1.4.0
polygraphy               0.49.0
pre-commit               3.7.0
protobuf                 4.25.3
psutil                   5.9.8
PuLP                     2.8.0
py                       1.11.0
pyarrow                  15.0.2
pyarrow-hotfix           0.6
pybind11                 2.12.0
pybind11-stubgen         2.5.1
Pygments                 2.17.2
PyGObject                3.42.1
PyJWT                    2.3.0
pynvml                   11.5.0
pyparsing                2.4.7
pyproject_hooks          1.0.0
pytest                   8.1.1
pytest-cov               5.0.0
pytest-forked            1.6.0
pytest-xdist             3.5.0
python-apt               2.4.0+ubuntu2
python-dateutil          2.9.0.post0
python-rapidjson         1.16
pytz                     2024.1
PyYAML                   6.0.1
regex                    2023.12.25
requests                 2.31.0
responses                0.18.0
rich                     13.7.1
rouge_score              0.1.2
safetensors              0.4.2
scipy                    1.13.0
SecretStorage            3.3.1
sentencepiece            0.2.0
setuptools               69.0.3
six                      1.16.0
stevedore                5.2.0
StrEnum                  0.4.15
sympy                    1.12
tabulate                 0.9.0
tensorrt                 9.3.0.post12.dev1
tensorrt_llm             0.9.0.dev2024040200
termcolor                2.4.0
tokenizers               0.15.2
tomli                    2.0.1
torch                    2.2.1
tqdm                     4.66.2
transformers             4.38.2
triton                   2.2.0
tritonclient             2.44.0
typing_extensions        4.8.0
tzdata                   2024.1
urllib3                  2.2.1
virtualenv               20.25.1
wadllib                  1.3.6
wheel                    0.42.0
xxhash                   3.4.1
yarl                     1.9.4
zipp                     1.0.0
zope.event               5.0
zope.interface           6.2
NikolaBorisov commented 2 months ago

We have the same issue. 54GB docker container is not great.

fedem96 commented 2 months ago

I'm also having the same issue

kelkarn commented 1 month ago

I am also seeing this same issue. @byshiue and/or @schetlur-nv - any updates on why this happens?

schetlur-nv commented 1 month ago

Can you try the instructions in https://github.com/triton-inference-server/tensorrtllm_backend?tab=readme-ov-file#option-1-build-via-the-buildpy-script-in-server-repo? We are trying to make the two closer to each other, but right now they differ quite a bit. Using build.py should result in a smaller image due to fewer dependencies.

kelkarn commented 1 month ago

@schetlur-nv - when building with the build.py, how do I specify that I want to use the main branch on the TRT-LLM repo? All it does is allow me to set a flag: TENSORRTLLM_BACKEND_REPO_TAG=rel; does rel mean main here?