philschmid / amazon-sagemaker-flan-t5-xxl

Example how to deploy FLAN-T5-XXL on Amazon SageMaker
MIT License
8 stars 3 forks source link

Curious- direct deploy XXL via model id with custom code? #2

Closed ctandrewtran closed 1 year ago

ctandrewtran commented 1 year ago

Hello Phil-

Big fan of your work. Been following this tutorial, and have a quick question.

I've noticed there are two ways to deploy with certain use cases

  1. Pass in the task + model id, and deploy to SageMaker (and for XXL, it requires the LMI Container). This does now allow for customer inference code. Quick process as you don't need to tar anything or have the model on hand.
  2. Tar the model + custom inference code to s3, then deploy by pointing the endpoint torwards this artifact. This allows for custom inference code. Slower process since you need the model on hand and also tar it.

Is it possible to deploy to sagemaker by passing the task + model id and provide custom inference code?

philschmid commented 1 year ago

Is it possible to deploy to sagemaker by passing the task + model id and provide custom inference code?

Indirectly, the "parameter" for TASK and MODEL_ID are passed in as environment variables so when you use an inference.py you can access them through os.environ.get() and then the model would be loaded from the hub and not from s3.

ctandrewtran commented 1 year ago

Is it possible to deploy to sagemaker by passing the task + model id and provide custom inference code?

Indirectly, the "parameter" for TASK and MODEL_ID are passed in as environment variables so when you use an inference.py you can access them through os.environ.get() and then the model would be loaded from the hub and not from s3.

Thank you for the response!

To double check my understanding, I would do the following then:

Is there any difference between the above and simply doing AutoTokenizer(flan-xxl)?

philschmid commented 1 year ago

Your assumption is correct.

ctandrewtran commented 1 year ago

Thank you!