Attempting to test with llama3 and seeing errors from BAM service implying it will only work with less than 2048 tokens yet model documentation states it can support 8196 and we are specifying '4096' as a default for 'max_new_tokens'
llama-3-70b-instruct
model_id: meta-llama/llama-3-70b-instruct
token limit: 8196
types: Instruct
languages: English
Meta Llama 3 is a family of auto-regressive language models, pretrained on over 15 trillion tokens of data from publicly available sources. These models use an optimized transformer architecture. Helpfulness and safety were prioritized in developing these models.
Failed to handle request to https://bam-api.res.ibm.com/v2/text/chat_stream?version=2024-01-10.
{
"error": "Bad Request",
"extensions": {
"code": "INVALID_INPUT",
"state": {
"errors": [
{
"message": "property 'max_new_tokens' must be <= 2048",
"instancePath": "body.parameters.max_new_tokens",
"params": {
"errorType": "maximum"
}
}
]
}
},
"message": "property 'max_new_tokens' must be <= 2048.",
"status_code": 400
}
Attempting to test with llama3 and seeing errors from BAM service implying it will only work with less than 2048 tokens yet model documentation states it can support 8196 and we are specifying '4096' as a default for 'max_new_tokens'
https://bam.res.ibm.com/docs/models#meta-llama-llama-3-70b-instruct
Workaround in
kai/config.toml