Closed joytianya closed 12 months ago
Unfortunately I don't have plans to support that right now. Using fp8 won't save much memory since the model weights are still in fp32 during training. Also since I don't have access to any H100s I won't be able to test any fp8 implementations.
When training with fp8, it can save memory, right?
Models are becoming increasingly larger, and there is a desire to train on even larger models. Can training with fp8 be supported? https://github.com/NVIDIA/TransformerEngine