Open elcolie opened 1 year ago
It may not work correctly.
The code seems to support MPS (please look here: https://github.com/tloen/alpaca-lora/blob/8bb8579e403dc78e37fe81ffbb253c413007323f/generate.py#L53) but it also seems that pytorch/MPS do not support 16bit operations: https://github.com/pytorch/pytorch/issues/78168
I run the same code on a CUDA architecture and it provides a result, while on M2 it works (and it is quite fast) but the results are different.
Anyway, if replacing the loading procedure with
model = LlamaForCausalLM.from_pretrained(
BASE_MODEL,
)
model = PeftModel.from_pretrained(
model,
ADAPTERS_MODEL,
)
the results are numerically correct, but it is quite slow.
Does anyone else have the same problem?
@crux82 any progress on this ?
No, I am still trying to figure out the problem here: https://github.com/pytorch/pytorch/issues/96610#issuecomment-1564265703
@crux82 thanks
It actually runs with the last update of Pythorch, even though the results are not consistent with the ones obtained using the GPU :-(
I updated to the very last version of Pytorch.
pip3 install --upgrade --no-deps --force-reinstall --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
The software also runs using FP16.
model = LlamaForCausalLM.from_pretrained(
BASE_MODEL,
device_map={"": device},
torch_dtype=torch.float16,
)
model = PeftModel.from_pretrained(
model,
ADAPTERS_MODEL,
device_map={"": device},
torch_dtype=torch.float16,
)
I have a model trained on a GPU that I would like to apply to new data.
Unfortunately the model.generate()
function generates different outputs from the environment with the GPU.
Moreover, when I remove torch_dtype=torch.float16
from the code above, the code hangs in the generation process.
Has anyone been able to use this model?
The code base is replies on nvidia/cuda. How can I use Apple MPS?