meta-llamaLlama-2-70b-chat-hf doesn't work with text-generation-webui, when max_new_tokens is set to be 4096

majestichou commented 8 months ago

Describe the bug

I used webui to load the meta-llamaLlama-2-70b-chat-hf model and set the Instruct template to llama-v2. Then I set the preset to Divine Intellect in the Generation Tab and changed "max_new_tokens" to 4096. I chose the "chat-instruct" mode. Then I went to the chat tab and asked codellamaCodeLlama-70b-Instruct-hf some questions. The answers were ridiculous and no matter what question I asked, AI answered exactly the same. .My conversation with AI is as follows:

AI: How can I help you today?
You: what can you do?
AI：Hello! I'm here to help you with any questions or concerns you may have. How can I assist you today?
You: what is machine learning?
AI: Hello! I'm here to help you with any questions or concerns you may have. How can I assist you today?
You: show me a piece of code with python
AI: Hello! I'm here to help you with any questions or concerns you may have. How can I assist you today?

When I set ""max_new_tokens" to 2000, everything was ok.

AI: How can I help you today?
You: what can you do?
AI：I can assist you in various ways, such as answering questions, providing recommendations, helping with decision-making, generating text, and even creating art or music. I can also learn from your interactions with me and adapt my responses based on your preferences and interests. Is there anything specific you'd like me to help you with today?
You: what is machine learning?
AI: Machine learning is a subfield of artificial intelligence that involves training algorithms to learn from data and improve their performance over time. It enables computers to recognize patterns, make predictions, and adapt to new situations without being explicitly programmed. Machine learning algorithms can be used for tasks such as image and speech recognition, natural language processing, recommendation systems, fraud detection, and many others.
In simple terms, machine learning is a way for computers to learn from experience and get better at doing something without being told exactly how to do it. Just like humans learn from trial and error, machines can too!
You: show me a piece of code with python
AI: Sure, here's an example of a simple machine learning algorithm written in Python using the scikit-learn library:

from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

# Load Boston Housing dataset
boston = load_boston()

# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.3)

# Create a linear regression model
reg = LinearRegression()

# Train the model on the training data
reg.fit(X_train, y_train)

# Make predictions on the testing data
y_pred = reg.predict(X_test)

# Evaluate the model's performance
r2 = reg.score(X_test, y_test)
print("R-squared value:", r2)

This code loads the Boston Housing dataset, splits it into training and testing sets, creates a linear regression model, trains it on the training data, makes predictions on the testing data, and evaluates the model's performance using the R-squared value.

Of course, this is just one example of a machine learning algorithm, and there are many other techniques and libraries available depending on the problem you're trying to solve.

I don't know if the problem is because of meta-llamaLlama-2-70b-chat-hf or text-generation-webui. Does meta-llamaLlama-2-70b-chat-hf limit max new tokens? Or is there a bug in text-generation-webui?

Is there an existing issue for this?

[X] I have searched the existing issues

Reproduction

python server.py --listen --model meta-llamaLlama-2-70b-chat-hf
set the Instruct template to llama-v2
set the preset to Divine Intellect in the Generation Tab
changed "max_new_tokens" to 4096
chose the "chat-instruct" mode
start chat

Screenshot

nothing

Logs

no error

System Info

centos7, A100 40G * 8

bartowski1182 commented 8 months ago

Yup, same issue I had here: https://github.com/oobabooga/text-generation-webui/issues/5374

The problem is that it's truncating the prompt to make room for the max new tokens and so it's removing messages until it can't. I dislike this behaviour, it seems extremely counterintuitive, but is apparently expected/acceptable

Offered a solution: https://github.com/oobabooga/text-generation-webui/pull/5382

If you're willing to merge that and make your own install, you can try, otherwise just reduce max new tokens as you have

majestichou commented 8 months ago

Yup, same issue I had here: #5374

The problem is that it's truncating the prompt to make room for the max new tokens and so it's removing messages until it can't. I dislike this behaviour, it seems extremely counterintuitive, but is apparently expected/acceptable

Offered a solution: #5382

If you're willing to merge that and make your own install, you can try, otherwise just reduce max new tokens as you have

thank a lot! great job!

oobabooga / text-generation-webui