Open bashimao opened 9 years ago
So you are imagining interpreting a Short
(or Char
) value as a 16-bit floating point number?
It would probably be relatively straightforward to support encoding/decoding Float
or Double
values as 16-bit floating point. Emulating the arithmetic operations would be painfully slow -- it might be faster to just support decoding/encoding.
You could have a value class that wraps a short and supports implicit conversions to/from float/double. That would be relatively convenient to use. But the problem with that would be that arrays of value classes are boxing. And if you really care that much about compact in memory representation you will want to use arrays. So I think the best thing we could do would be just to have conversion methods from/to double that handle all the bit-twiddling
We could wrap an Array[Short] in a value class ShortArray[T](val data: Array[Short]) extends AnyVal
, and then have the apply method on ShortArray
take a type class CodeShort[T]
to code/decode shorts.
Haven't tried it however!
@rklaehn: I kind of agree. Otherwise you would probably have to mimic floating point exceptions. By just encoding/decoding on the fly, you avoid most of that hazzle. We just have to make sure that we map NaN values properly to their 16 bit representations. If my understanding is right, IEEE binary16 cannot distinguish between INF and NaN. However, the Wikipedia article contains a reference to the ARM-encoding which apparently can encode NaN and INF differently. For the sake of compatibility it would probably be advisable to go with that encoding. https://en.wikipedia.org/wiki/Half-precision_floating-point_format#ARM_alternative_half-precision
For the encoding/decoding part. I found this guy who almost managed to do the wrapping properly in C#: http://codereview.stackexchange.com/questions/45007/half-precision-reader-writer-for-c Same for Java: http://stackoverflow.com/questions/6162651/half-precision-floating-point-in-java/6162687#6162687 Some more info about the conversion process: ftp://ftp.fox-toolkit.org/pub/fasthalffloatconversion.pdf
This SO answer seems to contain all we need. The question is whether we can use this. It says that it is public domain. But I asked the author anyway on SO.
http://stackoverflow.com/questions/6162651/half-precision-floating-point-in-java/6162687#6162687
@denisrosset that would probably work. However, I am not sure if stuff like this would be in scope for spire. The conversion function (bit twiddling) definitely belongs to spire IMHO.
@rklaehn agree with the scope.
However, it would be good to formalize the encoding/decoding process using a specialized typeclass; FastComplex
also falls into this category.
I will see after my PhD defense (today!) what I can propose.
@denisrosset Good Luck!
Agree about a typeclass for compact encoding. I do this all the time (see e.g. IntervalTrie.Element in https://github.com/rklaehn/intervalset/blob/master/src/main/scala/com/rklaehn/interval/IntervalTrie.scala ) and would love to have a more generic infrastructure.
I should have commented here -- I wrote an implementation on the train last night that can convert between a Short
(using the Wikipedia 16-bit encoding) and Double
"correctly" (it seems correct but I need to do some more research about how the rounding is supposed to work and write more tests).
@denisrosset Also, good luck on your defense!
@non I got permission from SO user x4u to use his code, in case we still need it.
Thanks, you guys are great!
Here's the code I wrote: https://gist.github.com/non/29f8d66036afca402f96
I haven't read the StackOverflow code yet -- I'll check it out now. Maybe the author has a better strategy.
EDIT: The author's code seems better because it doesn't use Float
/Double
operations but just reads/creates the bytes itself. I need to read more about the author's extensions though.
@non: Thanks! As one might expect, I cannot wait to use this. I just pasted your code into a Scala worksheet to play around with it. But unfortunately it fails to compile the Float16 class. Reason:
Error:(..., ...) value class may not be a member of another class class Float16(val raw: Short) extends AnyVal { lhs =>
@bashimao this is probably an artefact of how worksheets are compiled. I guess they are compiled as a class, and inner value classes are not allowed. Just remove the extends AnyVal
for playing around with it. That will make it slower, but still good enough for playing around with it.
1/32 of the possible value space is lost with NaNs. Why are you insisting on using IEE standard? It is not like you are ever going to binary cast it, this is not C. With those bits, you could boost already poor precision.
I tend to disagree. From a greedy perspective it surely looks like a waste. But in practice, encoding +INF, -INF and NaN separately is quite useful. Otherwise, it is almost impossible to figure out issues with long computations. Getting +INF/-INF gives you a hint that you have scratched the boundaries of the data-type. Without that you would have silent errors, that make the numeric outcomes of your computation unreliable. Expanding the possible value-range is always nice. But not at the expense of retaining information about erroneous states. Just think about the following case:
val a: Float = +INF val b: Half = a.toHalf
Now b, instead of being +INF is set to Half.MaxValue.
val c: Float = b.toFloat
Should you now convert c to Float.PositiveInfinity or whatever value Half.MaxValue is in Float? Which is the natural choice? What would you expect? Consider the following:
val d: Float = c / 2.0f val e: Half = d.toHalf
What should half be? I would expect INF/2 = INF again, since it is impossible to determine. However, without the ability to encode INF, you will have a silent error that becomes more problematic with each computation step.
@strelec If you just want to get a maximum of precision from a 16bit value without binary compatibility to IEE standards, maybe something like spire.math.FixedPoint for 16 bit might be a good approach.
@bashimao: I was not talking about INF and -INF. Those would stay. Same way, NaN would stay, but only one instance of it.
I know this is not an easy task. But probably a rewarding one for many applications that require a high memory bandwidth. I really would like to have half-precision float values in spire.
Description of IEEE 754 binary16: https://en.wikipedia.org/wiki/Half-precision_floating-point_format http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4610935