Missing parameters: optimum-cli vs Python Wrapper

huggingface / optimum-neuron

Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.

Apache License 2.0

184 stars 57 forks source link

Missing parameters: optimum-cli vs Python Wrapper #326

Open samir-souza opened 8 months ago

samir-souza commented 8 months ago

When compiling models using optimum-cli, it supports many input parameters that are not supported by the Python Wrappers, for instance:

When using optimum-cli, you can use parameters like --disable-validation

optimum-cli export neuron --model bert-base-uncased --sequence_length 64 --batch_size 1 --disable-validation out/

However, when using the Python Wrapper, this param (and others) are not supported:

from optimum.neuron import NeuronModelForSequenceClassification
input_shapes = {"batch_size": 1, "sequence_length": 64}
model = NeuronModelForSequenceClassification.from_pretrained("bert-base-uncased", export=True, disable_validation=True, **input_shapes)

disable_validation is ignored and it loads the model anyway.

Could you fix that and also double check if all params are supported by the wrapper, please?

michaelbenayoun commented 8 months ago

@JingyaHuang any idea why?

JingyaHuang commented 8 months ago

@michaelbenayoun I added disable_validation to CLI as there were users who don't have access to inf2.8xlarge (only had inf2.xlarge and got OOM) and wanted to compile on a purely CPU instance. But for the modeling APIs, these are designed to run inference (although it allows export via the class), so I assumed whoever uses the APIs has access to inf2... That's why it was not added to the class.

I think it should be the only argument supported in CLI but not the modeling API (except for atol, we don't do validation in the modeling API). Is there any particular use case that need these args in the modeling? @samir-souza

samir-souza commented 8 months ago

@JingyaHuang customers using Inferentia1, split the deployment step into 2 parts; 1/ compilation; 2/ execution. They compile their models on CPU (C5 instances) and there's no need to validate this step (even if they try to validate it will fail and break the solution). They do that using a SageMaker job before deploying the model to an inf1 instance. That's why it is important to have disable_validation and eventually other features in the API. By now, they are launching an optimum-cli process using Python due to this limitation, but this is not ideal.

JingyaHuang commented 8 months ago

I see, thanks for the explanation @samir-souza. So far in the modeling API, we assume that compiled models need to be loaded once the compilation is completed. Functions like save_pretrained won't work unless _from_pretrained and __init__ are called. I will check how I can support disable_validation in the modeling class, a refactoring might be needed (I am focusing on supporting other tasks, need to find bandwidth for that).

Also @philschmid, you are more familiar with sagemaker workflow, do you use the modeling API for export on non inferentia instances?