Closed YooSungHyun closed 6 months ago
hi @YooSungHyun
Thanks for the issue, yes indeed, when computing the device map, make sure to include inv_freq
as well since it is a non persistent buffer you're not assigning it in your device_map
attribute
@younesbelkada yes i understand, but very difficult to me. i found this guide https://huggingface.co/docs/accelerate/concept_guides/big_model_inference#the-devicemap
When I assign the encoder layer to gpu02 and gpu03 respectively, I get the not same device error. I think it's because the encoder layer 20 is operated on gpu02, and the input of layer 21 is operated on gpu03, so the output of gpu02 and the weight of gpu03 don't match each other. So I ended up using max_memory and "auto" to let my problem resolve itself. Is there any way or trick to divide the device_map smartly for each layer?
Thanks @YooSungHyun for getting back ! Yes I think the proposed solution sounds good, depending on your usecase, you might want to use for instance balanced_low_0
to make sure the first GPU is freed
I don't think there's a better way to do it for now, okay, thanks.
System Info
transformers==4.39.3 torch==2.2.2 CUDA: 12.1 (RTX 3090 * 4) python3.10
Who can help?
@ArthurZucker @younesbelkada @gante
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
use this code any llama-2 model
i want to use model that some parameters are loading gpu 02&03. so, i load
parameter index
and half assign to gpu02, another assign to gpu03 if imodel.generate
, raised errorso, i find why that error is raised, i can find here
LlamaRotaryEmbedding
device's input is None, so inv_freq is assign to cpuAm I using device_map incorrectly?
Expected behavior
generate is well