Dispatch bf16 at run time when using the JIT

This dispatches based on metal version at runtime when using the JIT.

The idea is to simplify distributing a single binary while supporting older OS yet getting newer features when available.

In general I think these are good guidelines to follow that give the most flexibility in terms of building applications with MLX:

As much as possible minimize where we conditionally compile based on the metal version
When using the JIT we can easily choose the right branch at run-time with negligible cost. As much as possible we should put things which dispatch in Metal in the JIT (when MLX_METAL_JIT=True) so that we at least have this option.
When not using the JIT you have two options: either deploy a single binary for the minimal target without using newer features OR ship multiple binaries (like we do in PyPi).

CC @davidkoski

ml-explore / mlx