Closed vkuzo closed 3 months ago
@vkuzo has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
Shall we consider to use (or add an option to use) do_bench_using_profiling for benchmarking, which only counts the GPU kernel time.
Shall we consider to use (or add an option to use) do_bench_using_profiling for benchmarking, which only counts the GPU kernel time.
I'm open to it, maybe in a separate PR?
This pull request has been merged in pytorch-labs/float8_experimental@1e9add319830be21520333c146232f9c0670b16c.
Stack from ghstack (oldest at bottom):
277
276
Summary:
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D58396927