Closed syncdoth closed 4 months ago
I'm trying to setup activation checkpointing for training larger model. I've adjusted the code to:
# tinyllama.py strategy = FSDPStrategy( auto_wrap_policy={Block}, activation_checkpointing_policy={Block}, state_dict_type="full", limit_all_gathers=True, cpu_offload=False, )
However, I keep getting the following error:
torch.utils.checkpoint.CheckpointError: torch.utils.checkpoint: A different number of tensors was saved during the original forward and recomputation. Number of tensors saved during forward: 27 Number of tensors saved during recomputation: 8
I'm unsure which may cause this in the GPT code.
Xformers SwiGLU is not compatible with activation checkpointing. Consider disable fused xformers swiglu with torch swiglu layers.
Thanks! I totally forgot about that.
I'm trying to setup activation checkpointing for training larger model. I've adjusted the code to:
However, I keep getting the following error:
I'm unsure which may cause this in the GPT code.