Finetuning fails with `RuntimeError: Please install flash-attn==1.0.3.post0 and triton==2.0.0.dev20221202`

matthiasgeihs commented 1 year ago

Following the instructions in the README, running through the docker container mosaicml/llm-foundry:1.13.1_cu117-latest, finetuning fails with RuntimeError: Please install flash-attn==1.0.3.post0 and triton==2.0.0.dev20221202.

pip install triton==2.0.0.dev20221202 fixes the problem. Ideally this hint should be part of the README. Furthermore, there is a few more hurdles to take if going through Docker. Which docker image to use? With which parameters to run it?

This command currently works for me (and then following instructions in llm-foundry and installing the expected triton version as outlined above).

docker run --rm -it --gpus all --shm-size=512m -v.:/root/replit mosaicml/llm-foundry:1.13.1_cu117-latest

Maybe you want to update the README accordingly? I imagine a few other people will get stuck along the way. (ideally specify a fixed docker image, not latest)

matthiasgeihs commented 1 year ago

Any chance to update the model such that it runs with PyTorch 2?

NL2Code commented 1 year ago

Same problem

replit / ReplitLM

Finetuning fails with `RuntimeError: Please install flash-attn==1.0.3.post0 and triton==2.0.0.dev20221202` #23