[Bug]Max output tokens from an LLM should be configurable.

mkbhanda commented 2 weeks ago

Priority

P2-High

OS type

Ubuntu

Hardware type

Xeon-SPR

Installation method

[X] Pull docker images from hub.docker.com
[ ] Build docker images from source

Deploy method

[ ] Docker compose
[ ] Docker
[X] Kubernetes
[ ] Helm

Running nodes

Single Node

What's the version?

Development branch, post V1.0.

Description

The ChatQnA example appears to be using the max_tokens parameter to control the number of output llm tokens, but it is not getting passed along if the re-ranker component is removed from the pipeline. Perhaps we have a bug in Mega or in the token used. The OpenAI openAPI uses max_completion_tokens and we perhaps have migrated to this incompletely. We may need to also check GenAIComps.

This was noticed by @leslieluyu.

Reproduce steps

Run ChatQnA without re-ranker and try to control the maximum output tokens by passing in some value.

Raw log

No response

lvliang-intel commented 3 days ago

@yao531441, Please help to check this issue. The maximum output tokens setting should not be related with reranking.

yao531441 commented 2 days ago

@mkbhanda @leslieluyu Can you provide more detailed steps to reproduce? Our test using Docker to start chatqna without reranking are normal.

opea-project / GenAIExamples