saxbophone / arby

Arbitrary precision arithmetic in C++, even at compile-time
https://saxbophone.com/arby/
Mozilla Public License 2.0
8 stars 1 forks source link

Josh/67 cout efficiency #109

Closed saxbophone closed 2 years ago

saxbophone commented 2 years ago

Timing statistics for the $2^{4000}$ test case (currently commented out due to runtime):

With optimisation:

Filters: Using std::ostream << arby::Nat generates decimal string of value
===============================================================================
All tests passed (15 assertions in 1 test case)

real    6m43.758s
user    6m38.712s
sys 0m0.946s

Without optimisation:

Filters: Using std::ostream << arby::Nat generates decimal string of value
===============================================================================
All tests passed (15 assertions in 1 test case)

real    37m19.593s
user    37m7.677s
sys 0m3.238s

This shows that while runtime for converting large numbers to decimal string is still long, this PR makes a significant improvement to it, with the exception of some experiments changing binary recursion for linear divide-and-conquer to the size of uintmax_t.

I believe further increases in efficiency are unlikely to be found in the methods changed in this PR, and might need to rely upon further efficiency improvements to the arithmetic methods. A detailed profile of the code would determine more precisely what functions are taking so much time.

Closes #67

saxbophone commented 2 years ago

After experimenting with a modified approach that divides linearly by the largest value that can be squeezed into uintmax_t, we get the following stats (including the large commented out test case):

real    6m23.361s
user    5m49.330s
sys     0m2.260s

If this version is to be used, it needs to be modified to not pad the leading digit to max_possible digits, but only the minimum number of digits needed for the value (i.e. don't pad).

saxbophone commented 2 years ago

Stats are now (in d717e85):

real    5m57.145s
user    5m43.861s
sys     0m0.861s

This is relatively good!

saxbophone commented 2 years ago

Not only is the version at the tip of this branch faster than the experimental recursive one, running it with code optimisation is even faster than the code-optimised recursive one. We'll definitely use the non-recursive one!