Closed vkuzo closed 1 month ago
@vkuzo has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
This pull request has been merged in pytorch-labs/float8_experimental@323fb489304bcbf5c2521af6bf948ecc35d84bc5.
Stack from ghstack (oldest at bottom):
278
277
Summary:
The mixin was originally used to share code with Float8 versions of RowParallelLinear and ColParallelLinear. Since we moved those to DTensor, the mixin is not needed anymore. Removing it to simplify the code in preparation of upcoming delayed scaling improvements.
In addition, making the from_float conversion use meta device to speed it up.
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D58396926