Curious- direct deploy XXL via model id with custom code?

philschmid / amazon-sagemaker-flan-t5-xxl

Example how to deploy FLAN-T5-XXL on Amazon SageMaker

MIT License

8 stars 3 forks source link

Curious- direct deploy XXL via model id with custom code? #2

Closed ctandrewtran closed 1 year ago

ctandrewtran commented 1 year ago

Hello Phil-

Big fan of your work. Been following this tutorial, and have a quick question.

I've noticed there are two ways to deploy with certain use cases

Pass in the task + model id, and deploy to SageMaker (and for XXL, it requires the LMI Container). This does now allow for customer inference code. Quick process as you don't need to tar anything or have the model on hand.
Tar the model + custom inference code to s3, then deploy by pointing the endpoint torwards this artifact. This allows for custom inference code. Slower process since you need the model on hand and also tar it.

Is it possible to deploy to sagemaker by passing the task + model id and provide custom inference code?

Am running into a bottle neck of tarring the model artifacts alongside the inference code. It takes an hour or more, and it'd be nice to make it as quick as deployment method 1 with the customizability of deployment method 2 since the model I am using is not finetuned.

philschmid commented 1 year ago

Is it possible to deploy to sagemaker by passing the task + model id and provide custom inference code?

Indirectly, the "parameter" for TASK and MODEL_ID are passed in as environment variables so when you use an inference.py you can access them through os.environ.get() and then the model would be loaded from the hub and not from s3.

ctandrewtran commented 1 year ago

Is it possible to deploy to sagemaker by passing the task + model id and provide custom inference code?

Indirectly, the "parameter" for TASK and MODEL_ID are passed in as environment variables so when you use an inference.py you can access them through os.environ.get() and then the model would be loaded from the hub and not from s3.

Thank you for the response!

To double check my understanding, I would do the following then:

Create a model.tar.gz and then inside model/code/ I would include my inference.py. There would be no model inside model/
Pass in the TASK and MODEL_ID when deploying via sagemaker
In my inference.py, I would then load in my model by accessing them through the environment variables being passed
- EG- AutoTokenizer(model_environ_var) etc

Is there any difference between the above and simply doing AutoTokenizer(flan-xxl)?

philschmid commented 1 year ago

Your assumption is correct.

ctandrewtran commented 1 year ago

Thank you!