Closed kaixih closed 1 year ago
@zhangqiaorjc I've noticed that this pull request has been in a "pull ready" status for a couple of days. Is there a specific action needed, like clicking a button to merge the PR, or will the process be automated by a robot?
This PR introduces a new performance-related config option
USE_FP8
, which will call the provided functiontr_set_fp8_quantization
in praxis to set the recommended layers inside the transformer to utilize the FP8 GEMM.There are four related PRs, and should be reviewed in this order: (1) https://github.com/google/praxis/pull/29 (2) https://github.com/google/paxml/pull/48 (3) https://github.com/google/praxis/pull/28 current-->(4) https://github.com/google/paxml/pull/49
cc. @pjannaty @reedwm @nluehr @lukaszlew