BAM service with llama3 - "message": "property 'max_new_tokens' must be <= 2048"

Attempting to test with llama3 and seeing errors from BAM service implying it will only work with less than 2048 tokens yet model documentation states it can support 8196 and we are specifying '4096' as a default for 'max_new_tokens'

https://bam.res.ibm.com/docs/models#meta-llama-llama-3-70b-instruct

llama-3-70b-instruct
model_id: meta-llama/llama-3-70b-instruct

token limit: 8196

types: Instruct

languages: English

Meta Llama 3 is a family of auto-regressive language models, pretrained on over 15 trillion tokens of data from publicly available sources. These models use an optimized transformer architecture. Helpfulness and safety were prioritized in developing these models.

Failed to handle request to https://bam-api.res.ibm.com/v2/text/chat_stream?version=2024-01-10.
{
  "error": "Bad Request",
  "extensions": {
    "code": "INVALID_INPUT",
    "state": {
      "errors": [
        {
          "message": "property 'max_new_tokens' must be <= 2048",
          "instancePath": "body.parameters.max_new_tokens",
          "params": {
            "errorType": "maximum"
          }
        }
      ]
    }
  },
  "message": "property 'max_new_tokens' must be <= 2048.",
  "status_code": 400
}

Workaround in kai/config.toml

provider = "IBMOpenSource"
args = { model_id = "meta-llama/llama-3-70b-instruct", max_new_tokens = 2048 }

konveyor-ecosystem / kai

BAM service with llama3 - "message": "property 'max_new_tokens' must be <= 2048" #172