konveyor-ecosystem / kai

Konveyor AI - static code analysis driven migration to new targets via Generative AI
Apache License 2.0
8 stars 10 forks source link

BAM service with llama3 - "message": "property 'max_new_tokens' must be <= 2048" #172

Open jwmatthews opened 2 months ago

jwmatthews commented 2 months ago

Attempting to test with llama3 and seeing errors from BAM service implying it will only work with less than 2048 tokens yet model documentation states it can support 8196 and we are specifying '4096' as a default for 'max_new_tokens'

https://bam.res.ibm.com/docs/models#meta-llama-llama-3-70b-instruct

llama-3-70b-instruct
model_id: meta-llama/llama-3-70b-instruct

token limit: 8196

types: Instruct

languages: English

Meta Llama 3 is a family of auto-regressive language models, pretrained on over 15 trillion tokens of data from publicly available sources. These models use an optimized transformer architecture. Helpfulness and safety were prioritized in developing these models.
Failed to handle request to https://bam-api.res.ibm.com/v2/text/chat_stream?version=2024-01-10.
{
  "error": "Bad Request",
  "extensions": {
    "code": "INVALID_INPUT",
    "state": {
      "errors": [
        {
          "message": "property 'max_new_tokens' must be <= 2048",
          "instancePath": "body.parameters.max_new_tokens",
          "params": {
            "errorType": "maximum"
          }
        }
      ]
    }
  },
  "message": "property 'max_new_tokens' must be <= 2048.",
  "status_code": 400
}

Workaround in kai/config.toml

provider = "IBMOpenSource"
args = { model_id = "meta-llama/llama-3-70b-instruct", max_new_tokens = 2048 }