rzikm / NetQD

.NET implementation of the double-double and quad-double technique for achieving almost 128-bit and 256-bit floating point precision types.
MIT License
2 stars 0 forks source link

Difficulty understanding the results #1

Open ebfortin opened 4 years ago

ebfortin commented 4 years ago

I've been trying to implement the David H Bailey paper on Double Double and came upon this library that do just that. I stopped my implementation because I thought my results were wrong. But I discover that my implementation and yours give the same results. So we must have implemented correctly the double double as documented in the article.

However, I fail to understand what I should expect. Roughly speaking I should get about 31 digits (base 10) of precision. So when I do something like this:

3.0 + 0.000000000000000000567

I expect a result more or less

3.000000000000000000567

which is 21 digits of precision. However with your library and my unfinished implementation, I get 3.0. Almost looks like that the library is just truncating at 16 digits precision, which is basically what a double is. Since it doesn't make any sense, there is something I don't understand.

Have you come across something similar? Do you understand how the maths work and why we get such bizarre result? Is it me that is just plain stupid?

rzikm commented 4 years ago

Hi, The truncation to "just double" precision is caused by the fact that the ToString method on DdReal simply does take only the higher double into account, see


        public string ToString(string format, IFormatProvider formatProvider)
        {
            if (formatProvider == null)
            {
                formatProvider = CultureInfo.CurrentCulture;
            }

            // TODO: use both x0 and x1 for formatting
            return x0.ToString(format, formatProvider);
        }

You should be able to demonstrate the added precision by subtracting 3.0 from the result, you should get back something very close to the 0.000000000000000000567 value.

The reason why I did not implement the ToString method correctly for either of the two extended precision types was either laziness or lack of time. The code was originally part of my bachelor thesis which used it in circuit simulation. I split it out later when I thought I would continue the simulator in my master thesis, but I chose a different topic in the end.

I will gladly accept pull a request for the missing functionality since I do not have time to implement it myself (as I mentioned, I have a master thesis to do). Since the entire library is a blatant rewrite of the original C++ source code in C#, you can use the original code to implement at least basic formatting (disregarding the IFormatProvider).

ebfortin commented 4 years ago

Thanks for the explanation. It makes sense. Although it can't be the explanation for why my implementation was not working. Just a coincidence it gave the same result.

I ended up implementing a library written before in C. It comes from a replacement of the libm library. In it there was code for double double math. It is used to validate rounding of doubles and other stuff. There is also triple double math. I ported the code. It seemed to be a lot more detailed, especially in terms of garantying valid results and rounding. I will probably create a repo on GitHub for it.