Open dhaval24 opened 2 years ago
FT backend only supports local directory now. It cannot load the s3 folder directly.
I see, is there a plan to support the S3 folders directly? I was in the impression that this is already supported.
We will consider it. Thank you for the suggestion.
Thank you, so for now suggestion is to download the assets from S3 to local container via a shell script? Nvidia solutions architects told me that this was supported and hence I was in this impression.
We find that we don't need to modify anything to support loading model from S3. You can refer the document https://github.com/triton-inference-server/fastertransformer_backend/blob/main/docs/t5_guide.md#loading-model-by-s3.
Description
The model repository structure on S3 is as follows:
The above structure is inline with how the model repository is created in this repo for T5 in this path:
all_models/t5/fastertransformer/…
Details:
It looks like when you start triton server with S3 path to model repository it downloads the contents to the docker container on the startup to a temp folder:
/tmp/folderXaegJB/
Which is basically the contents from s3 model repository from the directory: 3://*/triton-model-store/t5/fastertransformer.
However when triton tries to construct
model_checkpoint_path
to pass it to FT for loading T5 using the below line of code:https://github.com/triton-inference-server/fastertransformer_backend/blob/225b57898b830a13b5634ee10b812c96bad802b0/src/libfastertransformer.cc#L265
It basically constructs the below path which of course doesn’t exist.
/tmp/folderXaegJB/1/t5/config.ini
Hence there is inconsistency in how model repository is expected to be structured and how you download and resolve it from S3.
I cannot explicitly pass model_checkpoint_path because triton downloads all this from s3 into a temp folder which I don’t know before hand which temp folder it would be.
Note: It also appears that Faster transformer backend's model repository structure is different from the model repository guidance provided here: https://github.com/triton-inference-server/server/blob/e9ef15b0fc06d45ceca28861c98b31d0e7f9ee79/docs/user_guide/model_repository.md
The faster transformer backend and ensemble models also expect you to put files under\fastertransformer\\weights and \fastertransformer\config.pbtxt.
Please help investigate this issue.