Open agokrani opened 9 months ago
I have encountered the same problem, however, it seem more like a problem with FSDP a peft wrapped model. When i run run_fsdp work fine for me, but when i try to add a lora config, which lead to the same error. @pacman100 could you please look at this problem?
found out this is caused by module_wrap_policy
function in the FSDP(trainer.model)
.
Peft wrapped model passed a None
as the module class variable.
After I mannually filter out the None module_class , another error occurs:
File "train.py", line 197, in main
trainer.model = trainer.model_wrapped = FSDP(trainer.model, **kwargs)
File "/home/lzw/miniconda3/envs/Bert/lib/python3.8/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 487, in __init__
_init_param_handle_from_module(
File "/home/lzw/miniconda3/envs/Bert/lib/python3.8/site-packages/torch/distributed/fsdp/_init_utils.py", line 519, in _init_param_handle_from_module
_init_param_handle_from_params(state, managed_params, fully_sharded_module)
File "/home/lzw/miniconda3/envs/Bert/lib/python3.8/site-packages/torch/distributed/fsdp/_init_utils.py", line 531, in _init_param_handle_from_params
handle = FlatParamHandle(
File "/home/lzw/miniconda3/envs/Bert/lib/python3.8/site-packages/torch/distributed/fsdp/flat_param.py", line 537, in __init__
self._init_flat_param_and_metadata(
File "/home/lzw/miniconda3/envs/Bert/lib/python3.8/site-packages/torch/distributed/fsdp/flat_param.py", line 585, in _init_flat_param_and_metadata
) = self._validate_tensors_to_flatten(params)
File "/home/lzw/miniconda3/envs/Bert/lib/python3.8/site-packages/torch/distributed/fsdp/flat_param.py", line 731, in _validate_tensors_to_flatten
raise ValueError(
ValueError: Must flatten tensors with uniform `requires_grad` when `use_orig_params=False`
I'm not familar with FSDP, so still dont know how to figure it out.
When I am trying to train a model with FSDP, I am getting following error.
*** TypeError: isinstance() arg 2 must be a type, a tuple of types, or a union
It is happening on this specific line trainer.model = trainer.model_wrapped = FSDP(trainer.model, **kwargs)
and after a bit of debugging it feels like it has something to do with auto_wrap_policy. I am not really sure how to solve this. Do you have any suggestions. It was working fine until few days ago.