Open MaxS3552284 opened 3 weeks ago
Okay, I figured it out.
First, I tested all model and fine-tuning parameters with 4096 as the value, which were quite a few since everything is a multiple of 512. This didn’t do anything, so it was a bust. After figuring out that this mostly means the error lies with the deployment container, I at least had a hint. After lengthy Googling, it turned into a jackpot :)
So, for anyone with similar problems, here is how you do it: Instead of using the deployment functions as listed on the Huggingface page of the Mistral-7B-Instruct model, I used the functions as written here: https://github.com/aws-samples/Mistral-7B-Instruct-fine-tune-and-deploy-on-SageMaker/blob/main/Deploy_Mistral_7B_on_Amazon_SageMaker_with_vLLM.ipynb
Basically:
Alternatively, I also found a link (https://github.com/awslabs/extending-the-context-length-of-open-source-llms/blob/main/MistralLite/sagemaker-tgi-custom/example_usage.ipynb) describing how to modify the Huggingface environment, which also probably does the trick, but I didn't get the container to run yet. But I got one solution to work, so... meh~ ¯_(ツ)_/¯
Python -VV
Pip Freeze
Reproduction Steps
recently i finetuned a Mistral 7B Instruct v0.3 model and deployed it on an AWS Sagemaker endpoint. But got errors like this during inference in the sagemaker studio notebook:
" Received client error (422) from primary with message "{"error":"Input validation error: inputs tokens + max_new_tokens must be <= 4096. Given: 877 inputs tokens and 4096 max_new_tokens","error_type":"validation"}"."
Which means I am limited to 4096 Tokens. But max. tokens should be the following: Mistral 7B Instruct v0.1 = 8192 Mistral 7B Instruct v0.2,v0.3 = 32k
Input parameter were: "parameters": {"max_new_tokens": 4096, "do_sample": True}
I also hosted the basemodels from huggingface on sagemaker endpoints and they all seem to be limited to 4096 tokens.
Does anyone know how to fix this?
Expected Behavior
During inference the token limits should be far higher than 4k. Under 4k inference works as intended.
Additional Context
I got the code for deployment on AWS Sagemaker from here: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3
Suggested Solutions
No response