Closed kyriediculous closed 1 day ago
All comments have been addressed and commit history has been cleaned up
2024-09-27 13:32:48,339 INFO: Started server process [1]
2024-09-27 13:32:48,339 INFO: Waiting for application startup.
2024-09-27 13:32:55,774 - app.pipelines.llm - INFO - Local model path: /models/models--meta-llama--Meta-Llama-3.1-8B-Instruct
2024-09-27 13:32:55,774 - app.pipelines.llm - INFO - Directory contents: ['snapshots', 'refs', 'blobs']
2024-09-27 13:32:55,774 - app.pipelines.llm - INFO - Using fp16/bf16 precision
2024-09-27 13:32:55,798 - app.pipelines.llm - INFO - Max memory configuration: {0: '23GiB', 1: '23GiB', 'cpu': '26GiB'}
Loading checkpoint shards: 100%|██████████| 4/4 [00:00<00:00, 6.13it/s]
2024-09-27 13:33:04,805 - app.pipelines.llm - INFO - Model loaded and distributed. Device map: {'model.embed_tokens': 0, 'model.layers.0': 0, 'model.layers.1': 0, 'model.layers.2': 0, 'model.layers.3': 0, 'model.layers.4': 0, 'model.layers.5': 0, 'model.layers.6': 0, 'model.layers.7': 0, 'model.layers.8': 0, 'model.layers.9': 0, 'model.layers.10': 0, 'model.layers.11': 0, 'model.layers.12': 0, 'model.layers.13': 0, 'model.layers.14': 1, 'model.layers.15': 1, 'model.layers.16': 1, 'model.layers.17': 1, 'model.layers.18': 1, 'model.layers.19': 1, 'model.layers.20': 1, 'model.layers.21': 1, 'model.layers.22': 1, 'model.layers.23': 1, 'model.layers.24': 1, 'model.layers.25': 1, 'model.layers.26': 1, 'model.layers.27': 1, 'model.layers.28': 1, 'model.layers.29': 1, 'model.layers.30': 1, 'model.layers.31': 1, 'model.norm': 1, 'model.rotary_emb': 1, 'lm_head': 1}
/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/pydantic/_internal/_fields.py:160: UserWarning: Field "model_id" has conflict with protected namespace "model_".
You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
warnings.warn(
2024-09-27 13:33:04,869 - app.main - INFO - Started up with pipeline LLMPipeline(model_id=meta-llama/Meta-Llama-3.1-8B-Instruct)
2024-09-27 13:33:04,869 INFO: Application startup complete.
2024-09-27 13:33:04,870 INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
@rickstaa I have reviewed this and confirmed it works. Code needed to be rebased with new code gen updates from recent SDK releases. @kyriediculous can update this PR or we can move to the other PR.
Some brief research provided there are other implementations to serve LLM pipelines which was also briefly discussed with @kyriediculous. Settled on alternative implementations can be researched and tested if the need arises from user feedback. LLM SPE will continue to support and enhance this pipeline to suite the network requirements for the LLM pipeline as the network evolves.
Notes from review/testing:
There was only a couple small changes I made in addition to the changes needed to rebase this PR:
check_torch_cuda.py
to the dev folder since it only provides a helper to check cuda version.