We don't really properly support uint32 datatypes. This is only used for converting reduction outputs back to qs8. However, we don't do this via uint32 anywhere else (e.g. gemms), and I don't think we can properly support it.
Using s32 instead just means accumulators are effectively 31 bits instead of 32, which seems insignificant.
Remove u32-f32-cvt kernels
We don't really properly support uint32 datatypes. This is only used for converting reduction outputs back to qs8. However, we don't do this via uint32 anywhere else (e.g. gemms), and I don't think we can properly support it.
Using s32 instead just means accumulators are effectively 31 bits instead of 32, which seems insignificant.