Open ziemowit-s opened 5 months ago
@ziemowit-s I'll check this out! Sorry on the issue!
Don't worry, it's a relatively new library so bugs are expected :)
Hey, just want to confirm, I have an exact same issue with my Llama model. Inference on single samples works fine, but produces garbage on batches of multiple samples. I'm loading my model in bfloat16 without quantization.
@ziemowit-s @its5Q Apologies on the issues again :( Still debugging stuff so sorry on that!
Actually can confirm - batched inference in fact is breaking - I'm working on a fix asap - sorry for the wait guys!
@ziemowit-s @its5Q Much apologies on the delay - I temporarily fixed it by disabling Unsloth's fast inference paths - it seems like I need to dig deeper on why this is happening :( Using pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git
will get the temporary fix.
Again sorry for the inconvience!
@ziemowit-s @its5Q I think I finally fixed it!! On the example @ziemowit-s provided me:
[' The text emphasizes the benefits of humor in the healing process, including reducing stress, improving mood, and boosting the immune system. It suggests strategies such as seeking out humor that resonates, finding humor in everyday situations, sharing a laugh with others, using humor as a coping mechanism, and being gentle with oneself. The text also encourages taking things one step at a time and seeking support when needed.',
' The text discusses various causes for memory and concentration issues beyond anxiety, including nutritional deficiencies, sleep deprivation, chronic stress, medications, medical conditions, substance abuse, brain injuries, and aging. Daniella shares her experiences of anxiety, stress, skipping meals, and lack of sleep. Irvin suggests prioritizing self-care, relaxation techniques, and speaking with a therapist to manage stress and memory issues. Daniella expresses concerns about the cost and time commitment of adding new treatments to her therapy sessions. Irvin emphasizes the importance of investing in mental health and encourages Daniella to consider speaking with her therapist about her concerns.',
' Dissociative disorders involve alterations in consciousness, memory, identity, or perception, and can include feelings of worthlessness and isolation due to detachment from self and others. These symptoms should be discussed with a therapist for proper diagnosis and treatment. While feelings of worthlessness and isolation are common, they may indicate an underlying mental health condition. Reach out for help and support if these feelings persist and interfere with daily life.',
' Board games can help manage anxiety by providing distraction, social interaction, problem-solving, relaxation, and fun. The benefits may vary for individuals, so experimenting with different types of games is recommended.']
Single inference again is faster - batched similar speed for now.
Use install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git
to update on local machines (Colab / Kaggle no need)
Awesome, I'll test it as soon as I get to it
Tried it myself and I'm getting the same weird output as before. One thing that I've noticed is that the weird output only comes from the samples that are padded, and the longest prompt in the batch produces normal output. If all the samples in the batch are the same length in tokens, thus no padding is required, the model output for all samples is as to be expected. Using unsloth from commit d3a33a0dc3cabd3b3c0dba0255fb4919db44e3b5
@its5Q That's very weird :( For me it seems to work perfectly. I have an example if you can run this:
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/mistral-7b-instruct-v0.2-bnb-4bit",
max_seq_length = 2048,
dtype = None,
load_in_4bit = True,
)
inputs = [
"Create a Python program using Pytorch to create a simple neural network for image classification.\n"\
"You need to do the data preparation step, the training step, and the inference step as well.",
"Create a Python program to compute all the primes.",
"Write a long essay about happiness, and how to attain it. Provide clear markdown sections.",
"20*20=?",
]
tokenizer.pad_token = tokenizer.unk_token
tokenizer.padding_side = "left"
inputs = tokenizer(inputs, return_tensors = "pt", padding = True).to("cuda")
outputs = model.generate(**inputs, max_new_tokens = 512, do_sample = False, use_cache = True)
decoded = tokenizer.batch_decode(outputs)
for text in decoded:
print(text.replace(tokenizer.pad_token, ""))
print("_" * 70)
You will get:
<s> Create a Python program using Pytorch to create a simple neural network for image classification.
You need to do the data preparation step, the training step, and the inference step as well.
Here's a simple example of a neural network for image classification using PyTorch. This example uses the MNIST dataset, which consists of 60,000 28x28 grayscale images of digits 0-9.
First, let's install the required packages:
```bash
pip install torch torchvision
Now, let's write the code:
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.optim as optim
# Load and normalize the MNIST dataset
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
trainset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=100, shuffle=True)
testset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=100, shuffle=False)
# Define the neural network architecture
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(28 * 28, 128)
self.fc2 = nn.Linear(128, 64)
self.fc3 = nn.Linear(64, 10)
def forward(self, x):
x = x.view(-1, 28 * 28)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(),
______________________________________________________________________
<s> Create a Python program to compute all the primes.
Here's a simple Python program to find all prime numbers up to a given limit:
```python
def is_prime(n):
"""
Check if a number is prime.
"""
if n <= 1:
return False
for i in range(2, int(n ** 0.5) + 1):
if n % i == 0:
return False
return True
def find_primes(limit):
"""
Find all prime numbers up to a given limit.
"""
primes = []
for n in range(2, limit + 1):
if is_prime(n):
primes.append(n)
return primes
if __name__ == "__main__":
limit = int(input("Enter the limit: "))
primes = find_primes(limit)
print(f"Prime numbers up to {limit}:")
print(primes)
This program uses two functions: is_prime()
to check if a number is prime, and find_primes()
to find all prime numbers up to a given limit. The main part of the code is in the if __name__ == "__main__":
block, where it takes user input for the limit and then prints out the prime numbers found.
Write a long essay about happiness, and how to attain it. Provide clear markdown sections.
Happiness is a concept that has puzzled philosophers, theologians, and ordinary people for centuries. It is a state of well-being and contentment, a feeling of joy and satisfaction with life. Yet, despite its importance, happiness remains an elusive and subjective experience. In this essay, we will explore the nature of happiness, its sources, and the ways to attain it.
Happiness is a complex and multifaceted experience. It is not a static state, but rather a dynamic process that ebbs and flows throughout our lives. Happiness is not the absence of suffering or hardship, but rather the ability to find meaning and joy in the midst of challenges. It is a state of mind that is shaped by our thoughts, emotions, and actions.
Our thoughts play a significant role in shaping our experience of happiness. The way we think about ourselves, our circumstances, and the world around us can either enhance or diminish our sense of well-being. For example, focusing on the negative aspects of a situation can lead to feelings of sadness and frustration, while focusing on the positive can lead to feelings of gratitude and joy.
Emotions are another important factor in our experience of happiness. Positive emotions such as joy, love, and gratitude can enhance our sense of well-being, while negative emotions such as anger, sadness, and fear can detract from it. However, it is important to note that emotions are not static states, but rather transient experiences that come and go.
Our actions also play a role in our experience of happiness. Engaging in activities that bring us joy and fulfillment, such as pursuing a hobby or spending time with loved ones, can enhance our sense of well-being. Conversely, engaging in activities that are harmful or detrimental to our health and happiness, such as substance abuse or excessive work, can detract from it.
Despite the complexity of happiness, there are certain sources that have been identified as contributing to our sense of well-being.
Relationships with others are a fundamental source of happiness. Human
20*20=?
The answer to this question is 400. The multiplication of 20 by itself results in 400. The number 20 is multiplied by itself 20 times, resulting in a total of 40,000. However, since the question asks for the result of 20 multiplied by itself 20 times, we need to find the result of multiplying 20 by itself 20 times and then take the square root of that number to get the final answer of 400.
Here's the step-by-step calculation:
If you do them individually, I get:
<s> Create a Python program using Pytorch to create a simple neural network for image classification.
You need to do the data preparation step, the training step, and the inference step as well.
Here's a simple example of a neural network for image classification using PyTorch. This example uses the MNIST dataset, which consists of 60,000 28x28 grayscale images of digits 0-9.
First, let's install the required packages:
```bash
pip install torch torchvision
Now, let's write the code:
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.optim as optim
# Load the MNIST dataset
transform = transforms.ToTensor()
trainset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=100, shuffle=True)
testset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=100, shuffle=False)
# Define the neural network architecture
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(28 * 28, 128)
self.fc2 = nn.Linear(128, 64)
self.fc3 = nn.Linear(64, 10)
def forward(self, x):
x = x.view(-1, 28 * 28)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
# Training the neural network
for epoch in range
______________________________________________________________________
Some differences via https://www.diffchecker.com/text-compare/:
The 2nd one:
<s> Create a Python program to compute all the primes.
Here's a simple Python program to find all prime numbers up to a given limit:
```python
def is_prime(n):
"""
Check if a number is prime.
"""
if n <= 1:
return False
for i in range(2, int(n ** 0.5) + 1):
if n % i == 0:
return False
return True
def find_primes(limit):
"""
Find all prime numbers up to a given limit.
"""
primes = []
for n in range(2, limit + 1):
if is_prime(n):
primes.append(n)
return primes
if __name__ == "__main__":
limit = int(input("Enter the limit: "))
primes = find_primes(limit)
print(f"Prime numbers up to {limit}:")
print(primes)
This program uses two functions: is_prime()
to check if a number is prime, and find_primes()
to find all prime numbers up to a given limit. The main part of the code is in the if __name__ == "__main__":
block, where it takes user input for the limit and then prints out the prime numbers found.
No difference on 2nd.
On 3rd:
Write a long essay about happiness, and how to attain it. Provide clear markdown sections.
Happiness is a state of well-being and contentment. It is the feeling of joy, satisfaction, and fulfillment. Happiness is the ultimate goal of every human being. It is what we all strive for, whether consciously or unconsciously.
Happiness is important for several reasons. First and foremost, it makes life worth living. Without happiness, life can seem meaningless and unfulfilling. Happiness gives us a sense of purpose and direction. It helps us to appreciate the good things in life and to cope with the challenges and hardships that come our way.
Second, happiness is good for our health. Research has shown that happy people are healthier and more resilient than unhappy people. They have stronger immune systems, they recover from illness faster, and they live longer.
Third, happiness is good for our relationships. Happy people are more likely to have strong, healthy relationships with others. They are better able to communicate effectively, to forgive and to be forgiven, and to show love and compassion.
Fourth, happiness is good for our productivity and creativity. Happy people are more productive and creative than unhappy people. They are more focused, more motivated, and more innovative.
Despite the many benefits of happiness, it can be elusive. Many people spend their entire lives searching for happiness, only to find that it always seems just out of reach. So how can we attain happiness?
One of the most effective ways to cultivate happiness is to cultivate a positive attitude. This means focusing on the good things in life, rather than the bad. It means looking for the silver lining in every situation, and finding ways to turn negatives into positives.
Another effective way to cultivate happiness is to practice gratitude. This means being thankful for what we have, rather than focusing on what we don't have. It means appreciating the small things in life, and being grateful for the people and things that make our lives richer and more meaningful.
Strong relationships with others are essential for happiness. This
Very different to single decoding, but both are still coherent:
![image](https://github.com/unslothai/unsloth/assets/23090290/8c0c32ac-7e3b-4893-89e5-5fbfc00f567b)
This is because I use `torch.nn.functional.softmax` for single decoding and `torch.nn.functional.scaled_dot_product_attention` for multi decoding
And finally:
20*20=?
The answer to this question is 400. The multiplication of 20 by itself results in 400. The number 20 is multiplied by itself 20 times, resulting in a total of 40,000. However, since the question asks for the result of 20 multiplied by itself 20 times, we need to find the result of multiplying 20 by itself 20 times and then take the square root of that number to get the final answer of 400.
Here's the step-by-step calculation:
0 differences as well - the reasoning though is dumb lol
Also @its5Q you need to use padding_side = "left" or else the results will be wrong
Also @its5Q you need to use padding_side = "left" or else the results will be wrong
Oh yeah, that the problem, thanks. Now batched inference works as expected for me.
@its5Q im thinking if somehow I can default it to left, since people have said this was an ongoing issue!
0 differences as well - the reasoning though is dumb lol
wouldn't the difference be due to calculating a random seed each generation? Therefore generations would be different even when comparing non-batched with non-batched
@its5Q im thinking if somehow I can default it to left, since people have said this was an ongoing issue!
I'm not an expert in the transformers/unsloth code, but couldn't you just add a line of code before return model, tokenizer
with tokenizer.padding_side = "left"
?
@JIBSIL Oh if you select do_sample = False
there is no randomness involved. On the left
issue - the issue is for training, this makes training more complex, and Unsloth was primarily a training library, hence the reason why the padding is right
.
@JIBSIL Oh if you select
do_sample = False
there is no randomness involved. On theleft
issue - the issue is for training, this makes training more complex, and Unsloth was primarily a training library, hence the reason why the padding isright
.
Ah, thanks for the clarification. However, in the newest release, I am encountering a different error:
File /opt/conda/lib/python3.10/site-packages/unsloth/models/gemma.py:148, in GemmaModel_fast_forward_inference(self, input_ids, past_key_values, position_ids, attention_mask)
146 seq_len = past_key_values[0][0].shape[-2]
147 if bsz != 1:
--> 148 attention_mask = _prepare_4d_causal_attention_mask(attention_mask, (bsz, q_len), hidden_states, seq_len,)
149 pass
151 next_decoder_cache = []
NameError: name '_prepare_4d_causal_attention_mask' is not defined
Specifically using Gemma-7b. But as usual, mistral works fine 🤣
@its5Q Whoops you're correct! I decided to just run the notebook - I 100% finally fixed it now oh lord so sorry!!! The issue of multiple model supports :(
Hi there,
after loading the model with:
I performed a batch inference:
The received answer is nonsensical, but since it consists of 3 elements, and the second is the longest - this one is the only correct one, the other two are nonsensical. When I reduce all the texts (to a maximum of 3000 characters) - all the answers return to normal. It also works well when I infer each one in turn.
texts.txt nonsense_texts.txt
The texts to generate the summary are attached as texts.txt, and the nonsense answers are in the file nonsense_texts.txt (3 entries are separated by the
<END>
tag) to be reproduced, below is an example of a nonsense answer: