Open ezyang opened 3 years ago
They have an updated PLDI'21 paper https://arxiv.org/pdf/2104.04043.pdf which extends their technique to work on float32. Anywhere in PyTorch where we are using libm for floating point operations (I'm actually not sure exactly what we are using) we should use rlibm-32 instead. The paper also gives us a path to emulating posits on CPU (@wickedfoo you might like that!)
There seems to be two libraries https://github.com/rutgers-apl/rlibm and https://github.com/rutgers-apl/rlibm-32 First one have some float16 and bfloat16 optimized functions (including sqrt), while 2nd iteration focuses on floats and posits
Also, although the implementations compare well to other precise implementations (like libm), they still do "slowish" operations (including conversion to double precision) so they may not compare so well to libraries that are OK with more ulps of precision loss.
It looks like they are faster and more accurate. Paper describing the approach at https://arxiv.org/pdf/2007.05344.pdf