BFloat16 compatibility contributions - Githubissues

microsoft / microxcaling

PyTorch emulation library for Microscaling (MX)-compatible data formats

MIT License

165 stars 21 forks source link

BFloat16 compatibility contributions #21

Closed hmellor closed 3 months ago

hmellor commented 8 months ago

Would you be open to contributions that improve support for BFloat16?

Examples:

If bfloat: 16 and device supports torch.bfloat16, cast instead of emulate
Allow custom CUDA to work with torch.bfloat16 tensors:
- Af first by casting them to float, performing the operation, then casting them back to bfloat16
- Then, where applicable, adding BFloat16 operations to speed up emulation. (I believe this should be possible for MX types with scale_bits <= 8 and element formats which use <=8 bits)

gakolhe commented 3 months ago

Most of the suggestions have been incorporated.