IEEE 754 half-precision data type

bashimao commented 9 years ago

I know this is not an easy task. But probably a rewarding one for many applications that require a high memory bandwidth. I really would like to have half-precision float values in spire.

Description of IEEE 754 binary16: https://en.wikipedia.org/wiki/Half-precision_floating-point_format http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4610935

non commented 9 years ago

So you are imagining interpreting a Short (or Char) value as a 16-bit floating point number?

non commented 9 years ago

It would probably be relatively straightforward to support encoding/decoding Float or Double values as 16-bit floating point. Emulating the arithmetic operations would be painfully slow -- it might be faster to just support decoding/encoding.

rklaehn commented 9 years ago

You could have a value class that wraps a short and supports implicit conversions to/from float/double. That would be relatively convenient to use. But the problem with that would be that arrays of value classes are boxing. And if you really care that much about compact in memory representation you will want to use arrays. So I think the best thing we could do would be just to have conversion methods from/to double that handle all the bit-twiddling

denisrosset commented 9 years ago

We could wrap an Array[Short] in a value class ShortArray[T](val data: Array[Short]) extends AnyVal, and then have the apply method on ShortArray take a type class CodeShort[T] to code/decode shorts.

Haven't tried it however!

bashimao commented 9 years ago

@rklaehn: I kind of agree. Otherwise you would probably have to mimic floating point exceptions. By just encoding/decoding on the fly, you avoid most of that hazzle. We just have to make sure that we map NaN values properly to their 16 bit representations. If my understanding is right, IEEE binary16 cannot distinguish between INF and NaN. However, the Wikipedia article contains a reference to the ARM-encoding which apparently can encode NaN and INF differently. For the sake of compatibility it would probably be advisable to go with that encoding. https://en.wikipedia.org/wiki/Half-precision_floating-point_format#ARM_alternative_half-precision

bashimao commented 9 years ago

For the encoding/decoding part. I found this guy who almost managed to do the wrapping properly in C#: http://codereview.stackexchange.com/questions/45007/half-precision-reader-writer-for-c Same for Java: http://stackoverflow.com/questions/6162651/half-precision-floating-point-in-java/6162687#6162687 Some more info about the conversion process: ftp://ftp.fox-toolkit.org/pub/fasthalffloatconversion.pdf

rklaehn commented 9 years ago

This SO answer seems to contain all we need. The question is whether we can use this. It says that it is public domain. But I asked the author anyway on SO.

http://stackoverflow.com/questions/6162651/half-precision-floating-point-in-java/6162687#6162687

rklaehn commented 9 years ago

@denisrosset that would probably work. However, I am not sure if stuff like this would be in scope for spire. The conversion function (bit twiddling) definitely belongs to spire IMHO.

denisrosset commented 9 years ago

@rklaehn agree with the scope.

However, it would be good to formalize the encoding/decoding process using a specialized typeclass; FastComplex also falls into this category.

I will see after my PhD defense (today!) what I can propose.

rklaehn commented 9 years ago

@denisrosset Good Luck!

Agree about a typeclass for compact encoding. I do this all the time (see e.g. IntervalTrie.Element in https://github.com/rklaehn/intervalset/blob/master/src/main/scala/com/rklaehn/interval/IntervalTrie.scala ) and would love to have a more generic infrastructure.

non commented 9 years ago

I should have commented here -- I wrote an implementation on the train last night that can convert between a Short (using the Wikipedia 16-bit encoding) and Double "correctly" (it seems correct but I need to do some more research about how the rounding is supposed to work and write more tests).

non commented 9 years ago

@denisrosset Also, good luck on your defense!

rklaehn commented 9 years ago

@non I got permission from SO user x4u to use his code, in case we still need it.

bashimao commented 9 years ago

Thanks, you guys are great!

non commented 9 years ago

Here's the code I wrote: https://gist.github.com/non/29f8d66036afca402f96

I haven't read the StackOverflow code yet -- I'll check it out now. Maybe the author has a better strategy.

EDIT: The author's code seems better because it doesn't use Float/Double operations but just reads/creates the bytes itself. I need to read more about the author's extensions though.

bashimao commented 9 years ago

@non: Thanks! As one might expect, I cannot wait to use this. I just pasted your code into a Scala worksheet to play around with it. But unfortunately it fails to compile the Float16 class. Reason:

Error:(..., ...) value class may not be a member of another class
class Float16(val raw: Short) extends AnyVal { lhs =>

rklaehn commented 9 years ago

@bashimao this is probably an artefact of how worksheets are compiled. I guess they are compiled as a class, and inner value classes are not allowed. Just remove the extends AnyVal for playing around with it. That will make it slower, but still good enough for playing around with it.

strelec commented 9 years ago

1/32 of the possible value space is lost with NaNs. Why are you insisting on using IEE standard? It is not like you are ever going to binary cast it, this is not C. With those bits, you could boost already poor precision.

bashimao commented 9 years ago

I tend to disagree. From a greedy perspective it surely looks like a waste. But in practice, encoding +INF, -INF and NaN separately is quite useful. Otherwise, it is almost impossible to figure out issues with long computations. Getting +INF/-INF gives you a hint that you have scratched the boundaries of the data-type. Without that you would have silent errors, that make the numeric outcomes of your computation unreliable. Expanding the possible value-range is always nice. But not at the expense of retaining information about erroneous states. Just think about the following case:

val a: Float = +INF
val b: Half = a.toHalf

Now b, instead of being +INF is set to Half.MaxValue.

val c: Float = b.toFloat

Should you now convert c to Float.PositiveInfinity or whatever value Half.MaxValue is in Float? Which is the natural choice? What would you expect? Consider the following:

val d: Float = c / 2.0f
val e: Half = d.toHalf

What should half be? I would expect INF/2 = INF again, since it is impossible to determine. However, without the ability to encode INF, you will have a silent error that becomes more problematic with each computation step.

rklaehn commented 9 years ago

@strelec If you just want to get a maximum of precision from a 16bit value without binary compatibility to IEE standards, maybe something like spire.math.FixedPoint for 16 bit might be a good approach.

strelec commented 9 years ago

@bashimao: I was not talking about INF and -INF. Those would stay. Same way, NaN would stay, but only one instance of it.

typelevel / spire

IEEE 754 half-precision data type #501