Test fp16 without 16-bit storage

It would be nice to be able to look at the perfomance of fp16 ALU ops, even if we don't have 16-bit storage. This is the case for qualcomm A618, for example -- we can't load/store 16 bits, but we can do math.

Similarly, it would be nice to test RelaxedPrecision ALU ops on 32-bit values, which seems to be a common case for glslang-translated glsl.