stillwater-sc / universal

Large collection of number systems providing custom arithmetic for mixed-precision algorithm development and optimization for AI, Machine Learning, Computer Vision, Signal Processing, CAE, EDA, control, optimization, estimation, and approximation.
MIT License
401 stars 59 forks source link

Posit log & log1p native implementations #244

Open touisteur opened 3 years ago

touisteur commented 3 years ago

Hi,

I was looking for a posit implementation of the log and log1p functions, actually trying to benchmark how a posit32 (2,30?) would fare (precision and perf-wise) on a gpu or cpu (compared to a float binary32) - also trying to use (otherwise unused) int32 resources in a CUDA kernel.

Reading the code on elementary functions, they seem implemented as 'call std::log and convert to posit'? Did I miss anything?

If not present, is there a canonical posit log/log1p implementation I could start from?

Thanks in advance, Lionel

Ravenwater commented 3 years ago

@touisteur indeed, the elementary functions in Universal do not yet use native implementations. However, extensive research has been done on what is required for correctly rounded math libraries. In particular, Jay Lam and Santosh Nagarakatte have worked out the Minefield method that John Gustafson proposed for posits. Here is the research page: https://people.cs.rutgers.edu/~sn349/rlibm/

Their RLibm contains a perfectly rounded elementary function library for posit<32,2>. That would be the best place to start.

If you would be interested in implementing the log() and log1p() methods for posits in Universal that would be most appreciated. as we have been looking for a subject matter expert to get the math library work in Universal started. The task is non-trivial, as the Minefield method is dependent on the specific number system configuration and the sampling of the Reals it represents. Otherwise stated, to provide a correctly rounded math library for a parameterized number system that can represent thousands of number systems is an open research question. But we have PoCs for specific posit configurations, so it is not unreasonable to get started there and work our way into solving the meta problem.

Secondly, Jay's methodology for synthesizing the coefficients for the Minefield polynomial is not feasible for posits with nbits > 32. This creates a problem for creating an Oracle number system and associated math library that could be used as a baseline. So that is an open question as well.

Love to hear your thoughts on the above.

Ravenwater commented 3 years ago

@touisteur just released v3.34 which has a unified mathlib shim among fixpnt, cfloat, and posit.

While adding the fixpnt shim, the pattern of a general math lib shim that uses the compiler's math library and marshalls values through IEEE-754 double is now clear. The nice side-effect of this pattern is that bring-up of a new number system, we only need to implement two attributes: operator=() to map from IEEE-754 to the encoding, and double() to synthesize the IEEE-754 value from an encoding.

Having this general shim layer will also reduce the amount of code we need to manage. So we have a cool architectural task to design the general mathlib shim to offer the fallback functionality for 'small' values that are representable by double-precision IEEE-754, and an overload to a native implementation if the function is available.

Would that be a task that interests you?

touisteur commented 3 years ago

Hi @Ravenwater, very nice of you to answer so quickly and with such details. I'm (sadly) not a subject matter expert and feel I wouldn't do justice to the task. I'll probably revisit the subject in 2022. In the meantime, I'll take a look at rlibm & the published paper, though, for inspiration.

Ravenwater commented 3 years ago

@touisteur thanks for the update. I'll add the mathlib shim on the todo list, and I will use your (this) issue to trigger the first step of implementing a native log() implementation.

Definitely check out Jay's work as it is a very cool and state-of-the-art approach to solve the table makers' dilemma.

http://perso.ens-lyon.fr/jean-michel.muller/Intro-to-TMD.htm

touisteur commented 3 years ago

Heh I did try my hand with sollya for a similar topic (I want to take into account some information I have about the input of the log or log1p, in fact it's more log(x/y) and x and y are bounded...) but i had to admit defeat. Either I'm not quite competent enough, or these kinds of tools are pretty hard to use. Papers on the subject are fascinating though and I'm quite impressed we're still making so much progress still...

Ravenwater commented 3 years ago

@touisteur I have collected a set of references that we can compare and contrast:

gcc's libm's implementation: https://code.woboq.org/userspace/glibc/sysdeps/ieee754/dbl-64/e_log.c.html musl: https://git.musl-libc.org/cgit/musl/tree/src/math/log.c fdlibm: https://www.netlib.org/fdlibm/e_log.c rlibm: https://github.com/rutgers-apl/rlibm-32/blob/main/source/posit32/log.c

I'll ping Jay and Santosh to see if they have a polynomial fitted for 8-bit and 16-bit posits. I am very interested to explore the differences of the polynomials that correctly round 8, 16, 24, 32, 40, 48 bit posits. I have not yet seen an analysis that studies the structure of those differences, so maybe we can drive that in this experiment.

touisteur commented 3 years ago

Oh thanks! I had no idea rlibm had a posit32 implementation. I need to dive in and check how it fares against float and double logs from libcs...

Edit: seems based on softposit and seems quite 'easy' to use and read. I need to find a keyboard+pc after the weekend and try that. Didn't find a log1p but that might not be that far away. Also need to check whether it's using ints larger than int32 underneath. Thanks!

Ravenwater commented 3 years ago

Well, not quite: it still uses doubles for compute, and maps back to posit on the final conversion.

posit32_t rlibm_log(posit32_t x) {
    if (x.v >= 0x80000000) {
        return castP32(0x80000000);
    } else if (x.v == 0) {
        return castP32(0x80000000);
    } else if (x.v == 0x40000000) {
        return castP32(0x0);
    }

    doubleX fix, fit, dX;
    fix.d = convertP32ToDouble(x);

    int m = fix.x >> 52lu;
    m -= 1023lu;
    fix.x &= 0xFFFFFFFFFFFFFlu;
    fix.x |= 0x3FF0000000000000lu;

    fit.x = fix.x & 0xFE00000000000lu;
    int FIndex = fit.x >> 45lu;
    fit.x |= 0x3FF0000000000000lu;

    dX.d = fix.d - fit.d;
    dX.d *= oneByF[FIndex];

    // Figure out index. 7 bits are the same. 64 - 18 = 46
    unsigned long index = (dX.x & 0x01FFFFFFFFFFFFFFlu) >> 46;
    const double* coeff = logcoeffs[index];

    // Compute polynomial
    double y = coeff[3];
    y *= dX.d;
    y += coeff[2];
    y *= dX.d;
    y += coeff[1];
    y *= dX.d;
    y += coeff[0];
    y *= dX.d;

    // Output compensation
    y += m * LN2LOW;
    y += lnOneDotF[FIndex];
    y += m * LN2HIGH;
    return convertDoubleToP32(y);
}

what is different is the polynomial: just three terms.

Universal has native posit sqrt, and for coarse posit configurations, the approximation algorithms don't work as they assume a lot finer sampling of the continuum. That implies that there is a HUGE design space for polynomials and appropriate sampling resolution.

touisteur commented 3 years ago

Oh yeah I just re-read the code and fell on that double accumulator, errrr.

touisteur commented 3 years ago

Well, not quite: it still uses doubles for compute, and maps back to posit on the final conversion.


what is different is the polynomial: just three terms.

Universal has native posit sqrt, and for coarse posit configurations, the approximation algorithms don't work as they assume a lot finer sampling of the continuum. That implies that there is a HUGE design space for polynomials and appropriate sampling resolution.

So I gather it's done with double because coefficients in posit32 wouldn't 'do the job'. I wonder how far we'd be.

Ravenwater commented 3 years ago

that is the $64000 question. I just pinged the FPbench.org community with that question. There are several research groups (Rutgers, Utah, and UWash) that are working on math function library precision, so I am hoping that they have some inkling of what the issues are.

touisteur commented 3 years ago

I am probably going to end up trying fpm for log/log1p computation. I know it's fixed point, but it allows defining 'where' the 'fixed point' is. So if I know the dynamic of the input (I do) I can probably place my 'point' where it's giving enough accuracy... Will report. Not soon, though.

edit https://mikelankamp.github.io/fpm/

Ravenwater commented 3 years ago

@touisteur took a look at fpm. If I implement the log/exp functions for you, would you consider using Universal instead of fpm?

touisteur commented 3 years ago

Hi,

Yes indeed. I'll have to I think. Fpm would be a stepping stone only. Not even sure it does the job!

Le lun. 6 sept. 2021 à 13:31, Theodore Omtzigt @.***> a écrit :

@touisteur https://github.com/touisteur took a look at fpm. If I implement the log/exp functions for you, would you consider using Universal instead of fpm?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stillwater-sc/universal/issues/244#issuecomment-913575968, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGBRHDV64VZBTREH6ZZL423UASRCVANCNFSM5C7BJWFA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Ravenwater commented 3 years ago

Cool, I'll put it on my docket for this week. I'll report here on the progress.

touisteur commented 3 years ago

That is very nice of you. Please PLEASE take your time, there's no emergency on my side and it may take me some time afterwards to check the result in my usecase.

I feel log and log1p are such interesting uses for alternative representations, what with the range of their output (the 'exposant' part I mean) being so small.

Le lun. 6 sept. 2021 à 14:18, Theodore Omtzigt @.***> a écrit :

Cool, I'll put it on my docket for this week. I'll report here on the progress.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stillwater-sc/universal/issues/244#issuecomment-913604058, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGBRHDUXY3WTPMZBK7UMJVLUASWRVANCNFSM5C7BJWFA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.