pytorch / torchtune

A Native-PyTorch Library for LLM Fine-tuning
https://pytorch.org/torchtune/main/
BSD 3-Clause "New" or "Revised" License
4.06k stars 375 forks source link

Multi-GPU QLoRA? #844

Closed cuichenx closed 2 months ago

cuichenx commented 5 months ago

Hi, first of all thanks for the great tutorials on lora and qlora! I was able to follow them very easily. I was wondering if multi-gpu QLoRA is supported? I couldn't find a config file in the repo, and when I tried using the multi-gpu LoRA recipe and adding model.quantize_base=True, I get this error:

ValueError: The module has CPU parameters or buffers when `sync_module_states=True`, which requires them to be on GPU. Please specify the `device_id` argument or move the module to GPU before passing it to FSDP.

I was wondering if multi-gpu QLoRA is supported currently, or if it is on the roadmap? Thanks a lot!

joecummings commented 5 months ago

Hey @cuichenx - glad you found the tutorials useful!

Currently, multi-GPU FSDP + QLoRA is not supported in torchtune, but this is something we are actively working on. Turns out it's a non-trivial combination. See this blog post from the folks over at answer.ai for some more information.

cc: @rohan-varma

cuichenx commented 5 months ago

Thanks for the fast response! Looking forward to it :)

kartikayk commented 5 months ago

@cuichenx I'd be curious to learn more about your use case. Are you looking at QLoRA instead of LoRA because of memory constraints? Or something else? My impression has been that LoRA gives a higher quality model though at slightly more memory usage. Wondering if you've tried LoRA and if this has not worked on your setup? Thanks for taking a look at torchtune! :)

cuichenx commented 5 months ago

Hi @kartikayk, I'm currently doing some exploratory studies on QLoRA vs LoRA, so I was looking for a more apples-to-apples comparison because LoRA for a larger model like 34B or 70B would need multiple GPUs. But for now I can do my studies on the smaller models. Thanks for making this awesome framework!

kartikayk commented 5 months ago

@cuichenx sounds awesome! We'll make sure to comment on here as soon as we have this up and running!

rohan-varma commented 5 months ago

Thanks for trying out QLoRA @cuichenx and glad to hear that the tutorial is helpful!

Re: LoRA vs QLoRA, as per the tutorial and enablement PR (https://github.com/pytorch/torchtune/pull/478), in my experience we're actually able to get pretty good convergence w/QLoRA and match LoRA for some eval tasks, with about 50% memory savings. As mentioned though, we don't yet have the multi-GPU support and are working on support for this.

RdoubleA commented 2 months ago

This was recently added in #909 and is currently available as an experimental feature in our latest stable version. Closing as completed for now, please reopen if you run into any issues using it.