A subtle set of fixes to enable FSDP one-shot. The fixes are mostly focused on correctly undoing the naming changes enforced by the wrapped FSDP module.
Testing
Note: The FSDP process was run with num_processes: 1, as well as num_processes: 2. Both runs yielded similar perplexities.
Note: This PR should be landed in unison with: https://github.com/neuralmagic/compressed-tensors/pull/58
Feature Description
A subtle set of fixes to enable FSDP one-shot. The fixes are mostly focused on correctly undoing the naming changes enforced by the wrapped FSDP module.
Testing
Note: The FSDP process was run with
num_processes: 1
, as well asnum_processes: 2
. Both runs yielded similar perplexities.Model generation script
To run FSDP training:
Model testing script
Result
The resulting post-FSDP one-shot model has the same perplexity and sparsity of its weights compared to the counterpart: