triton-inference-server / fastertransformer_backend

BSD 3-Clause "New" or "Revised" License
411 stars 133 forks source link

T5: Triton Model Repository (containing model weights and configuration) on S3 doesn't work as expected #66

Open dhaval24 opened 1 year ago

dhaval24 commented 1 year ago

Description

It appears that Triton Server with Faster transformer backend doesn't work as expected when loading the model repository from S3 (containing both configuration and model weights). 

Release: v1.2
GPU: V100
Command used to invoke Triton Server: 
`CUDA_VISIBLE_DEVICES=0 /opt/tritonserver/bin/tritonserver --model-repository s3://*/users/dhaval-doshi/t5-3b/triton-model-store/t5/ --log-info true`

The invocation fails with the below error:

[ERROR] Can't load '/tmp/folderXaegJB/1/t5/config.ini'
terminate called after throwing an instance of 'std::runtime_error'
  what():  [FT][ERROR]  Assertion fail: /workspace/build/fastertransformer_backend/build/_deps/repo-ft-src/src/fastertransformer/triton_backend/t5/T5TritonModel.cc:91 

The model repository structure on S3 is as follows:

S3://*/triton-model-store/t5/fastertransformer/config.pbtxt
S3://*/triton-model-store/t5/fastertransformer/1/<weight files and config.ini files>

The above structure is inline with how the model repository is created in this repo for T5 in this path: all_models/t5/fastertransformer/…

Details:

It looks like when you start triton server with S3 path to model repository it downloads the contents to the docker container on the startup to a temp folder:

/tmp/folderXaegJB/

ls /tmp/folderXaegJB/
1  config.pbtxt

Which is basically the contents from s3 model repository from the directory: 3://*/triton-model-store/t5/fastertransformer.

However when triton tries to construct model_checkpoint_path to pass it to FT for loading T5 using the below line of code:

https://github.com/triton-inference-server/fastertransformer_backend/blob/225b57898b830a13b5634ee10b812c96bad802b0/src/libfastertransformer.cc#L265

It basically constructs the below path which of course doesn’t exist.

/tmp/folderXaegJB/1/t5/config.ini

Hence there is inconsistency in how model repository is expected to be structured and how you download and resolve it from S3.

I cannot explicitly pass model_checkpoint_path because triton downloads all this from s3 into a temp folder which I don’t know before hand which temp folder it would be.

Note: It also appears that Faster transformer backend's model repository structure is different from the model repository guidance provided here: https://github.com/triton-inference-server/server/blob/e9ef15b0fc06d45ceca28861c98b31d0e7f9ee79/docs/user_guide/model_repository.md

The faster transformer backend and ensemble models also expect you to put files under \fastertransformer\\weights and \fastertransformer\config.pbtxt.

Please help investigate this issue.


### Reproduced Steps

```shell
1. Upload the model weights and model repository to S3 bucket. You can copy the model repository in this repo and upload the weights of 1-gpu (after running the conversion script) inside the \1 folder.
2. Run the triton with command shown in the description above.
byshiue commented 1 year ago

FT backend only supports local directory now. It cannot load the s3 folder directly.

dhaval24 commented 1 year ago

I see, is there a plan to support the S3 folders directly? I was in the impression that this is already supported.

byshiue commented 1 year ago

We will consider it. Thank you for the suggestion.

dhaval24 commented 1 year ago

Thank you, so for now suggestion is to download the assets from S3 to local container via a shell script? Nvidia solutions architects told me that this was supported and hence I was in this impression.

byshiue commented 1 year ago

We find that we don't need to modify anything to support loading model from S3. You can refer the document https://github.com/triton-inference-server/fastertransformer_backend/blob/main/docs/t5_guide.md#loading-model-by-s3.