Open YooSungHyun opened 8 months ago
and i have this problem on save and load sharded too... https://github.com/pytorch/pytorch/issues/103627 how can i solve it?
I am also facing the same issue...Any solution to this?
Facing the same issue.
facing the same issue
facing the same issue, Any solution to this? thanks.
I fixed and merged this on main by disabling activation checkpointing https://github.com/pytorch/examples/pull/1273
By changing the below line in distributed/FSDP/configs/fsdp.py
- fsdp_activation_checkpointing: bool=True
+ fsdp_activation_checkpointing: bool=False
Will look for a proper fix next
Context
Your Environment
Expected Behavior
training well
Current Behavior
error raised and training stop
Possible Solution
Steps to Reproduce
TypeError: T5Block.forward() got an unexpected keyword argument 'offload_to_cpu'
...Failure Logs [if any]