triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend
Apache License 2.0
689 stars 103 forks source link

gpt_model_path with Triton's S3 based model repository support #181

Open sacdroid opened 11 months ago

sacdroid commented 11 months ago

I am trying externalize model artifacts to S3 using Triton's Cloud Storage support for model repository. I am able to get this working for pre/postprocessing tokenizer model instances using

parameters {
  key: "tokenizer_dir"
  value: {
    string_value: "$$TRITON_MODEL_DIRECTORY/1"
  }
}

and the replacing this in model.py.

How can I achieve the same gpt_model_path ? I looked at the code and it does not seem to support dynamic path today. Do you have any alternatives which does not require me to include model artifacts in docker container or have external mount?

krishung5 commented 10 months ago

I think the TRITON_MODEL_DIRECTORY is only supported in Triton Python backend at the moment. It is up to the backend to be able to dynamically set the path when using S3. CC @Tabrizian for any possible correction.

shixianc commented 9 months ago

Do we have any update on this feature? External repo loading is quite critical for larger model image as Docker layer size has 52Gib limitation (at least on AWS). Therefore we cannot build the model into the image. Request to get gpt_model_path dynamically set the path based on where it cache the repo.

rahchuenmonroe commented 3 weeks ago

Any updates on this?