Closed wanghao14 closed 6 months ago
Hmm, that's unfortunate. If it's possible memory-wise, could you please test if the same happens without FSDP? Maybe you have a smaller model available that you could test it with.
As a potential workaround, could you please try if this works:
from peft.tuners.tuners_utils import BaseTunerLayer
from peft.utils import ModulesToSaveWrapper
# to disable adapters
for module in model.modules():
if isinstance(module, (BaseTunerLayer, ModulesToSaveWrapper)):
module._disable_adapters = True
output = model(...)
# to re-enable them, assuming there is only one adapter
for module in model.modules():
if isinstance(module, (BaseTunerLayer, ModulesToSaveWrapper)):
module._disable_adapters = False
Apologies for the delayed response.
DDP
, the performance appears to be significantly lower at the beginning of training and GPU occupancy is high. This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
System Info
peft: 0.7.1; torch: 2.3.0.dev20240128+cu121; accelerate: 0.26.1; transformers: 4.37.2; Python: 3.10.12 Using the Pytorch container 23.12 provided by Nvidia.
The hardware environment contains four A100-40G graphics cards.
Who can help?
@pacman100 @younesbelkada
Information
Tasks
examples
folderReproduction
Hi, I want to use both FSDP and peft in my project, and I insert Lora to the pretrained LLM by
peft.get_peft_model
and then wrap the whole model usingtorch.distributed.fsdp.FullyShardedDataParallel
. The only trainable part of the model is the Lora adapter. Additionally, I need to call the original model bywith my_model.disable_adapter():
. When running the whole code, I encounter following error(intercepted relevant parts):Expected behavior
Using
with my_model.disable_adapter():
to call the original model, even though it is wrapped by FSDP.