jsoftware / jsource

J engine source mirror
Other
645 stars 91 forks source link

use fma for tolerant comparisons #156

Open moon-chilled opened 1 year ago

moon-chilled commented 1 year ago

The tolerant compare uses a calculation of the form a < b*c. This can be done faster and with more precision by looking at the sign bit of fma(b,c,-a). But it breaks with negative zero.

moon-chilled commented 1 year ago

err, that should be fma(b,-c,a). fma(b,c,-a) is for <=. Because exact 0 or positive underflow will be positive 0; negative 0 can only be from negative underflow. (Assuming no negative 0 inputs.)

HenryHRich commented 1 year ago

I think negative 0 is not an issue since it will occur on the fringe of the tolerant range.

You need to verify that this will work for infinities, which may produce NaN intermediate values.

It is vital that TEQ, which is used for singletons, be updated to track the code used in eqDD.

moon-chilled commented 1 year ago

Tolerant a=b is

a>cct*b = b>cct*a

I want to replace it with:

signbit(cct*b-a) = signbit(cct*a-b)

But suppose a is _0, b is 0. Then we have:

signbit(cct*0-_0) = signbit(cct*_0-0)

signbit(0-_0) = signbit(_0-0)

signbit(0) = signbit(_0)

Which is false, but we want _0=0.

Good point about nan; looks like ieee doesn't specify the sign of a nan result. We want signbit(_+__) to be different from signbit(__+_); if we're lucky, x86 will spec it so the nan sign always comes from the right addend (or left; w/e). Will look later.

Can do something similar for the tolerant zero problem: replace x=y with (x<.y) = (x>.y). On x86, x>.y and x<.y are both y if x=y, so that ensures that, if both are zero, the arguments to the comparison will be the same zero. It's the same number of operations as the current implementation (replaces compares with max/min). (On arm, however, <. and >. consider 0 to be greater than _0, so this won't work there; bully for arm.)

HenryHRich commented 1 year ago

That's fatal to your idea if true. Are you sure it's true? Normally the result of addition is never -0.

hhr

On Mon, Oct 3, 2022, 9:10 PM moon-chilled @.***> wrote:

Tolerant a=b is

a>cctb = b>ccta

I want to replace it with:

signbit(cctb-a) = signbit(ccta-b)

But suppose a is _0, b is 0. Then we have:

signbit(cct0-_0) = signbit(cct_0-0)

signbit(0-_0) = signbit(_0-0)

signbit(0) = signbit(_0)

Which is false, but we want _0=0.

Good point about nan; looks like ieee doesn't specify the sign of a nan result. We want signbit(+) to be different from signbit(+); if we're lucky, x86 will spec it so the nan sign always comes from the right addend (or left; w/e). Will look later.

Can do something similar for the tolerant zero problem: replace x=y with (x<.y) = (x>.y). On x86, x>.y and x<.y are both y if x=y, so that ensures that, if both are zero, the arguments to the comparison will be the same zero. It's the same number of operations as the current implementation (replaces compares with max/min). (On arm, however, <. and >. consider 0 to be greater than _0, so this won't work there; bully for arm.)

— Reply to this email directly, view it on GitHub https://github.com/jsoftware/jsource/issues/156#issuecomment-1266275694, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEKVAJ3VZPQ632QUVDLRHLDWBN7YNANCNFSM6AAAAAAQ3KPDHY . You are receiving this because you commented.Message ID: @.***>

moon-chilled commented 1 year ago

if what's true?

_0 - 0 and _0 + _0 are both _0.

HenryHRich commented 1 year ago

I was mistaken about that.

hhr

On Mon, Oct 3, 2022, 9:24 PM moon-chilled @.***> wrote:

if what's true?

_0 - 0 and _0 + _0 are both _0.

— Reply to this email directly, view it on GitHub https://github.com/jsoftware/jsource/issues/156#issuecomment-1266282800, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEKVAJ344XZSEGY27OEG3ADWBOBOPANCNFSM6AAAAAAQ3KPDHY . You are receiving this because you commented.Message ID: @.***>

moon-chilled commented 1 year ago

Seems x86 gives you -nan for both _+__ and __+_. I guess they want + to be commutative. Annoying.

moon-chilled commented 1 year ago

Actually, it might be to count ulps. I think this is basically what tolerant comparison does, but need to review the definition and think it over.

moon-chilled commented 1 year ago

nevermind, not ulp measure because it uses magnitude, not just exponent

moon-chilled commented 1 year ago

But we can determine an upper and lower ulp bound (tolerant hashing basically derives a slightly coarse upper bound). If most comparands are either below the lower bound (definitely equal) or above the upper bound (definitely unequal), then it may be a win to branch to see if they're in between.

This just works (except for -0?) for comparands of opposite sign, but not for inf. Grumble.