mosaicml / llm-foundry

LLM training code for Databricks foundation models

Apache License 2.0

3.99k stars 525 forks source link

Closed j316chuck closed 2 months ago

j316chuck commented 2 months ago

Description

Add fp32 to the set of valid inputs for attention layer.

Note:

Before: full-eval-fp32-train-fp8-llama3-8b-metamath-4ep-ObqlFj 🔴
- mixed_precision: full
After: torch-attn-full-eval-fp32-train-fp8-metamath-4ep-pmGJKN ✅
- mixed_precision: full

j316chuck commented 2 months ago

Tested manually, there is no unit test for this.

dakinggg commented 2 months ago

@j316chuck I don't think this is correct. Flash attention does not support fp32 (unless that changed recently?)