Future of the doubledouble library

Hi Juraj, I've just come across this library in the last couple of days when having a look around the internet for python double double arithmetic libraries and I am fan of it! Nice and simple, does the job. I've specifically been looking into this because one of the libraries that I have been maintaining clifford might start using some doubledouble arithmetic under the hood for some of the more numerically unstable algorithms it uses in high dimensional algebras.

I've been playing around with the library in my fork: https://github.com/hugohadfield/doubledouble specifically looking at integration with the numba JIT compiler and at custom numpy doubledouble dtypes, I'd like to clean these experiments up a bit and submit them as PRs but before I did I thought I would ask you what your plan is with this code? Are you keen to keep developing this further or have you moved onto other projects now?

From my perspective I'm mostly interested in fleshing out numba JIT support, adding unit tests and setting up some kind of CI system (github actions?). This would let me add the library as an optional dependency to clifford and would help iron out any bugs that might exist in the implementation.

Let me know what you think!

Hello Hugo,

thank you for the kind words!

The short answer is yes, I've moved to other projects now, and yes, I'm yet willing to help with your effort.

To expand on it a bit:

One of the main issues is the lack of FMA. As practically every non-trivial routine uses _two_product I think this is where the largest improvement would come from and for me personally, this is the main enabler for further improvements. Also, there are more precise algorithms for some of the operations at the expense of performance and again, if there was FMA, I would accept trading a bit of speed for one or two bits of better precision. Without FMA, probably not so much.

For me, one area that is currently missing from the library and if I ever get back to it will perhaps start from there is root finding and that means polynomial evaluation and complex numbers. On the other hand, I'm not sure how much sense does it make to include such functionality for a generic library like this one.

Then there is CI and tests. I understand people want those but in my eyes they are a solution to a problem I don't have. The project has yet to receive single commit and in general I find it much more useful to get a simple isolated test case pointing to an error in the code than having a test which might catch some bug but also might not. Another question is, what would such tests test? There are no rigorous proofs in the library so I'm not sure to what precision one would want to go and if someone wanted to improve a routine, then in addition to writing the routine they would also need to write/fix the test for it.

Numba/NumPy. In general, I would prefer to keep the code as plain as possible for pure Python (i.e. CPython/PyPy). A good example is the __init__ method of yours: even though I would prefer to avoid using comments for type definitions what I find a bit problematic is removing of float function (now x could be an int, if one is not careful enough) and it just makes no sense for pure Python to write 0.0*x. Please don't take it as an insult I'm sure it can be written differently but what I try to say is I will always consider pure Python needs/convention first. One solution is to have a separate files for pure Python and Numba/NumPy. Would it work for you to basically copy-paste the whole "doubledouble.py" into, say, "doubledouble_numpy.py" (as you have now), so to say to go wild there, keep the features in sync across the files and users would import the version they need?

Right now I'm working on a different project so not sure if I can start working on this right away but will follow the issue here and if you would prefer you can contact me via email.

Hello to both of you. When saying "pure Python", which versions are you aiming to support? For example, both your implementations feature boilerplate in the class definition which the dataclasses module, introduced in version 3.7, aims to remove. However, dataclasses didn't gain __slots__ functionality until 3.10, which may be a dealbreaker on that front. To quantify the savings, I cloned this repo and modified doubledouble.py, shortening it by about 40 lines without compromising any functionality for Python >= 3.10 (as far as I can tell).

Hello Sven! I don't mind dropping Python 2, if that answers your question. I can even imagine requiring 3.10, given the pace the library is updated. However, please, I have to ask: what does dataclass do that the current code does not? I mean, the 40 lines are really simple and I'm not sure if their removal outweighs the requirement for newer Python. Besides the boilerplate, is it faster? Does it use less memory? Does it help to resolve the types better? And DoubleDouble is immutable, wouldn't typing.NamedTuple be even better fit? PS: What happens if one compares DoubleDouble using dataclass and a float? Currently there is a special path for that situation.

I'll attempt to answer your questions in order:

The dataclass code is shorter and therefore potentially more maintainable than the "boilerplate" version. Admittedly, maintainability is a lesser concern for this library since it's already written and manually tested, but a possible future QuadrupleDouble library (if someone writes it) could benefit.
I haven't tested the speed or the memory footprint of both variants, but a dataclass shouldn't add any further overhead after the class has been created. Concerning memory footprint and speed, the dataclass can be slotted just like the original class starting from Python 3.10, as I mentioned. I even expect a minor speed-up in the __str__ and __hex__ methods due to them now using f-strings rather than old-style string formatting ;)
Type resolution/conversion is a use case which dataclass doesn't cover, sadly. Many methods within your version of DoubleDouble treat doubles like DoubleDoubles with y=0.0, i.e. ones instantiated from just one double. This affects arithmetics (to a lesser extent, since those operations are explicitly implemented anyways) and comparisons (where the generated methods don't accept floats). One can implement the latter functions (or a subset, with functools.total_ordering) manually, which partially of defeats the purpose. I may be missing an edge case, but could type resolution be implemented as either:
- if isinstance(other, float) or isinstance(other, int): other = DoubleDouble(other) or
- if isinstance(other, numbers.Real) and not isinstance(other, DoubleDouble): other = DoubleDouble(other)?
My DoubleDouble dataclass specifies frozen=True, which makes its instances immutable. NamedTuple is both slower and more error-prone (e.g. no real distinction between named and regular tuples with the same number of elements).

Sorry for the late reply, I have been busy with other stuff while thinking about these questions, reading docs and drafting this text. I may tinker with my version a little more to see if dataclass still provides some benefit or if it isn't worth it.

sukop / doubledouble

Future of the doubledouble library #2