Open davidenitti opened 1 year ago
I was able to fix this on a pc upgrading transformers and peft from git, but on another server I didn't manage to fix this even after an upgrade of the same packages. I think it's required to clean the cache weights and cache dir used for offload_folder, but still I didn't manage to fix this in a server
@davidenitti I tried the updated way you mentioned that is to download from the git and still was stuck in this error ... Do you have any way out ?
not yet
I got a similar error. I think it stems from the fact that it gets confused when there are already files in the offload_folder. Try just creating a new/unique offload folder that's empty and maybe see if it helps.
I guess the code is specifically checking for the keys "base_model.model.model.layers.*", while the "offload" files are with the starting name "model.layers.*.dat".
Somewhere, the key prefixes ("base_model.model.") are not coded properly in the "offload" case. I just tried to adjust the code in accelerate/utils/offload.py to strip the prefix, execution progresses little more from my initial point of error, but stops at another error.
KeyError: 'model.layers.16.self_attn.q_proj.lora_A.weight'.
I hope there is a fix for the same ASAP, so that we can run LoRa models in smaller GPU specifications. Or please do let us know if there is any configurations from the user level to avoid this error.
Thank you
Save problem when using different package. This is so annoying.
I have this error:
I'm using a slightly modified code just to save on disk and limit the GPU memory, but the changes shouldn't be the source of the problem: