Tracking issue for Mac support

pannous commented 1 year ago

M1 / M2 32GB … 128GB any hopes?

remixer-dec commented 1 year ago

No luck with this repo, "bitsandbytes" dependency is heavily relying on CUDA. But there is a repo for cpu inference, just change the prompts to prompts[0], so it doesn't crash with max_batch_size=1.
It takes more than 10 minutes to produce output with max_gen_len=20, even GPT-J 5B took me around a minute on CPU. I also tried to make an MPS port with gpu acceleration, it works faster, but ~~the output is not good enough imo, not sure if it is always good on cpu or if I just got lucky on my first generation.~~ UPDATE: the model gives good outputs with python3.10 + pytorch-nightly

pannous commented 1 year ago

thanks!

remixer-dec commented 1 year ago

Actually, I was wrong. After I tried my port with a higher version of python+pytorch, the outputs were as good as the cpu ones, I am happy that it worked after all!

tloen / llama-int8

Tracking issue for Mac support #4