Increase the limit of tokens for openai compatible llms

myrulezzz commented 5 months ago

Self Checks

[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to file this report (我已阅读并同意 Language Policy).

1. Is this request related to a challenge you're experiencing?

The limit of tokens for openai compatible llms is 2000. If you use a tool with your custom llm there is a possibility not to get the full response. Try mixtral8x7b and it seems that the response is not fully generated as i reach the maximum number of tokens which is 2000.

2. Describe the feature you'd like to see

Add functionality for the openai compatible api llms to increase the token limit to 4000-10000

3. How will this feature improve your workflow or experience?

i will get a full response if i use a custom tool with open source llm like mixtral 8x7b

4. Additional context or comments

In the properties of openai compatible llms should be a functionality where i can limit max_tokens to a bigger number. 4000-10000 i suppose

5. Can you help us with this feature?

[ ] I am interested in contributing to this feature.

dosubot[bot] commented 5 months ago

🤖

Hey there, @myrulezzz! Great to see you diving into another interesting challenge with us. Hope you're doing awesome!

Thank you for your feature request. I understand that you would like to increase the token limit for OpenAI compatible LLMS from 2000 to between 4000 and 10000 in the dify repository.

The token limit for the OpenAI compatible LLMs is not directly set in the provided code. However, it's important to note that the token limit is often a constraint set by the model itself or the API you're using to access the model. For instance, OpenAI's GPT-3 has a maximum token limit of 4096.

If you're using a different model or API that allows for a higher token limit, you would need to adjust the request you're sending to that API to allow for more tokens. This would typically be done in the _generate method where the request to the API is constructed. However, without more information about the specific API or model you're using, it's not possible to provide a more specific solution.

If you're using a custom model with a higher token limit, you would need to ensure that your model is properly configured to handle the increased number of tokens. This could involve adjusting the model's architecture or training regimen to handle longer sequences of tokens.

In conclusion, increasing the token limit is not as simple as changing a value in the code. It involves considerations at the model level, the API level, and potentially the infrastructure level (as handling more tokens may require more computational resources).

You can refer to the following sources for more information:

api/core/model_runtime/model_providers/openai_api_compatible/llm/llm.py

The current method used in the dify codebase to set the token limit for OpenAI compatible LLMs is not explicitly mentioned in the provided context. However, the method get_num_tokens() is used to retrieve the number of tokens used in a given text. This method takes in the model, credentials, and texts as parameters. The number of tokens is then returned.

Here is the code snippet for the get_num_tokens() method:

def test_get_num_tokens():
    model = OpenAITextEmbeddingModel()

    num_tokens = model.get_num_tokens(
        model='text-embedding-ada-002',
        credentials={
            'openai_api_key': os.environ.get('OPENAI_API_KEY'),
            'openai_api_base': 'https://api.openai.com'
        },
        texts=[
            "hello",
            "world"
        ]
    )

    assert num_tokens == 2

Please note that this method is used for testing purposes. The actual implementation might be different and is not provided in the given context. Sources:

api/tests/integration_tests/model_runtime/openai/test_text_embedding.py

The current maximum token limit set for OpenAI compatible LLMs in the dify codebase is 1024 tokens. This is indicated by the "model_max_length" field in the tokenizer configuration JSON file. Sources:

api/core/model_runtime/model_providers/__base/tokenizers/gpt2/tokenizer_config.json

I hope this information is helpful. If you have any further questions or need more clarification, please don't hesitate to ask.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

takatost commented 5 months ago

@guchenhe pls check this out

guchenhe commented 5 months ago

You're free to modify both context length and max output token in the model config to suit your model. I tried mixtral on together.ai's endpoint and had it return 6000+ tokens no problem.

myrulezzz commented 5 months ago

Hi i deployed the model with vllm on runpod so localai accepts only 2000

takatost commented 5 months ago

Hi i deployed the model with vllm on runpod so localai accepts only 2000

Are you using localAI provider configuration?

myrulezzz commented 5 months ago

Yes i do. There is no other option for vLLM endpoint

takatost commented 5 months ago

Yes i do. There is no other option for vLLM endpoint

Can we try using the OpenAI Compatible API to integrate your model?

github-actions[bot] commented 5 months ago

Close due to it's no longer active, if you have any questions, you can reopen it.

langgenius / dify