Yi34B's 8k context capability on dify doesn't work

PhenixZhang commented 8 months ago

Self Checks

[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to file this report (我已阅读并同意 Language Policy).

Dify version

0.5.2

Cloud or Self Hosted

Self Hosted (Docker), Self Hosted (Source)

Steps to reproduce

The model I use is Yi-34B (8k), and the openai-api interface is packaged through dify. The result shows that it can only receive <=2k token length, but if only the openai-api interface can receive 4k token length, I would like to ask what is the specific reason?

Here is 2k output： Here is 3k output： Here is 6k（openai-api） output：

Can ensure that the parameter given to dify is 8K.

✔️ Expected Behavior

Ability to give Yi34B's 8k context capability on dify doesn't work

❌ Actual Behavior

can't to give Yi34B 8k context capability on dify

dosubot[bot] commented 8 months ago

🤖

Hey there, @PhenixZhang! Great to see you diving into the depths of dify again. How's the coding journey treating you this time?

Based on the information you've provided, it seems like you're trying to utilize the full 8k context capability of the Yi34B model on dify, but are only able to receive a token length of <=2k. This could be due to the max_tokens parameter in the ernie-bot-8k.yaml configuration file. By default, this parameter is set to 1024, but it can be increased up to a maximum of 8000.

To utilize the full 8k context capability, you need to explicitly set the max_tokens parameter to a higher value when making the API call. Make sure you're not only setting the context_size to 8000 but also adjusting the max_tokens parameter accordingly in your request.

Here are the relevant files for your reference:

As for the specific reason why the Yi34B model is limited to a token length of <=2k in dify, it's not explicitly mentioned in the provided context. However, such limitations are often due to memory constraints and the computational resources required for processing larger sequences in the model.

Lastly, according to the specifications of the Yi34B model, it can handle a maximum token length of 192,000 tokens. This information is found under the parameter_rules section for the max_tokens parameter in the provided YAML file.

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

takatost commented 8 months ago

Have you tried the OpenAI compatible API model provider?

PhenixZhang commented 8 months ago

Have you tried the OpenAI compatible API model provider?

Yes， This is the config screenshot

And using openai server directly without dify is no problem.

takatost commented 8 months ago

Can you try changing the max_tokens parameter to 4K or larger and see if it works?

PhenixZhang commented 8 months ago

Can you try changing the max_tokens parameter to 4K or larger and see if it works?

Of course~ I set the maximum number of tokens to 8K at the beginning, respectively in the following two places:

guchenhe commented 7 months ago

Hi @PhenixZhang, seems that I'm not able to reproduce this issue on my end. Can you upload the backend logs? If you're ok with it and your endpoint is accessible in public, you can also drop your endpoint's information at guchenhe@gmail.com, i'll take a look

github-actions[bot] commented 7 months ago

Close due to it's no longer active, if you have any questions, you can reopen it.

soulteary commented 6 months ago

refs to https://github.com/langgenius/dify/pull/2865, When the function is online, the API can be output in the interface and the online model can be used directly.

Or configure the API of the internal testing platform.

langgenius / dify