Closed Willian-Zhang closed 1 year ago
I had the same issue running neox
on the M1.
https://github.com/zphang/minimal-gpt-neox-20b/issues/5
With mps
, I got "...developed by EleutherAI. in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in" and with cpu on the same machine, I got "... developed by EleutherAI. It is a state-of-the-art language model...".
That implementation doesn't use transformers
at all, it's just plain pytorch. I tested on 1.13.0.dev20220803
.
I have the same issue with galactica-6.7b used with huggingface's transformers
.
This is a minimal example to reproduce:
from transformers import AutoTokenizer, OPTForCausalLM
tokenizer = AutoTokenizer.from_pretrained("facebook/galactica-6.7b")
model = OPTForCausalLM.from_pretrained("facebook/galactica-6.7b").to("mps")
input_text = "The Transformer architecture [START_REF]"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("mps")
outputs = model.generate(input_ids, max_new_tokens = 20)
print(tokenizer.decode(outputs[0]))
output with mps:
The Transformer architecture [START_REF] results results results results results results results results results results results results results results results
________________________________________________________
Executed in 40.03 secs fish external
usr time 25.76 secs 0.17 millis 25.76 secs
sys time 24.76 secs 2.54 millis 24.76 secs
output when removing to("mps")
(running in cpu mode):
The Transformer architecture [START_REF] Attention is All you Need, Vaswani[END_REF] is a sequence-to-sequence model that uses self
________________________________________________________
Executed in 56.61 secs fish external
usr time 42.90 secs 0.17 millis 42.90 secs
sys time 27.70 secs 2.38 millis 27.70 secs
torch version: 1.14.0.dev20221117
I'm using a Apple M1 Max Macbook with MacOS 13.0
Setting use_cache
to False
fixed it for me. ,e.g.
outputs = model.generate(input_ids=input_ids, do_sample=False, use_cache=False, max_new_tokens=max_length)
What can be the cause of this? I have the same problem in a different pacakage but my model does not have a genrate function with use_cache
What can be the cause of this? I have the same problem in a different pacakage but my model does not have a genrate function with use_cache
Is this still happening with latest nightly ?
Hey, yes i just tested it. It is still happening wit pytorch-2.1.0.dev202 (installed today). The quality of the output is way worse when unsing "mps" compares to "cpu" on mac
Thanks @SvenStahlmann and @Willian-Zhang, we will investigate the issue.
@Willian-Zhang thanks for filling this issue. Could you please try latest nightly? This should be fixed there: pip3 install --pre --force-reinstall torch --index-url https://download.pytorch.org/whl/nightly/cpu
This is a conversation between A and B.
A: Your should say something meaningful.
B: I don't know what you mean. But I think you should say something meaningful.
I can confirm the problem is gone with torch-2.1.0.dev20230804
on macOS 13.5 (22G74).
@Willian-Zhang thanks for filling this issue. Could you please try latest nightly? This should be fixed there: pip3 install --pre --force-reinstall torch --index-url https://download.pytorch.org/whl/nightly/cpu
This is a conversation between A and B. A: Your should say something meaningful. B: I don't know what you mean. But I think you should say something meaningful.
🐛 Describe the bug
This should give something like:
Same behavior not only applies to
bigscience/bloom-560m
, all CausalLM seems to results in similar behavior.Just FYI
To get rid of warning and rule out weird behavior from MPS fallback, I added some code to Transformer source
site-packages/transformers/models/bloom/modeling_bloom.py
This does not effect the reported bug behavior.Versions
cc @ezyang @gchanan @zou3519 @kulinseth @albanD @malfet @DenisVieriu97 @razarmehr @abhudev