Closed jeromeku closed 4 months ago
Hi @jeromeku thanks for creating the issue. We don't yet support QLoRA on multiple devices, but this is a high priority and we're hoping to have this working very soon! There is active work going on to enable this, you can see the PR #909.
Closing it since #909 was merged. You will need pytorch nightlies to run it. Config here: https://github.com/pytorch/torchtune/blob/6f37d15b2c99d49ca926173455569aa6f8e24d9d/recipes/configs/llama3/70B_full.yaml#L9
Please feel free to reopen the issue if you still have questions. Thanks! :)
Is it possible to run
QLoRA
finetuning on more than a single device? I don't see any configs forQLoRA
other than forsingle_device
.If not, what are the gating issues? More generally, what methods need to be implemented for custom tensor types (e.g.,
NF4
) in order to compose withFSDP
or other distributed training methods?