Closed lkirk closed 4 months ago
I talked to @apragsdale and we want to move forward with returning NaNs.
There's one lingering issue in my head: Technically dividing by zero in C (according to the c99 standard) is undefined behavior (see section 6.5.5.5 in this document -- quoted below):
The result of the / operator is the quotient from the division of the first operand by the second; the result of the % operator is the remainder. In both operations, if the value of the second operand is zero, the behavior is undefined.
We can observe that the behavior is only slightly different in different mainstream compilers, I was testing to see if hardcoded vs dynamic values would be any different... though I wonder if the string conversion is doing something here... anyways, maybe a bit too far in the weeds, I just wanted to point out that it's technically UB.
lkirk@skittles ~/repo/tskit/python/divzero-tests (git)-[ld-matrix-nan-ratio-statistics] % cat main.c
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv) {
printf("stat 1/0: %f\n", 1.0 / 0.0);
printf("stat 0/0: %f\n", 0.0 / 0.0);
if (argc == 2) {
printf("dyn 1/0: %f\n", 1.0 / atof(argv[1]));
printf("dyn 0/0: %f\n", 0.0 / atof(argv[1]));
}
}
lkirk@skittles ~/repo/tskit/python/divzero-tests (git)-[ld-matrix-nan-ratio-statistics] % gcc -Wall -Wextra -Werror -Wpedantic -W -Wmissing-prototypes -Wstrict-prototypes -Wconversion -Wshadow -Wpointer-arith -Wcast-align -Wcast-qual -Wwrite-strings -Wnested-externs -fshort-enums -fno-common main.c -o gcc
lkirk@skittles ~/repo/tskit/python/divzero-tests (git)-[ld-matrix-nan-ratio-statistics] % clang -Wall -Wextra -Werror -Wpedantic -W -Wmissing-prototypes -Wstrict-prototypes -Wconversion -Wshadow -Wpointer-arith -Wcast-align -Wcast-qual -Wwrite-strings -Wnested-externs -fshort-enums -fno-common main.c -o clang
lkirk@skittles ~/repo/tskit/python/divzero-tests (git)-[ld-matrix-nan-ratio-statistics] % ./gcc 0
stat 1/0: inf
stat 0/0: -nan
dyn 1/0: inf
dyn 0/0: -nan
lkirk@skittles ~/repo/tskit/python/divzero-tests (git)-[ld-matrix-nan-ratio-statistics] % ./clang 0
stat 1/0: inf
stat 0/0: nan
dyn 1/0: inf
dyn 0/0: -nan
lkirk@skittles ~/repo/tskit/python/divzero-tests (git)-[ld-matrix-nan-ratio-statistics] % python -c 'import numpy as np;print("1/0", np.float64(1)/np.float64(0));print("0/0", np.float64(0)/np.float64(0))'
<string>:1: RuntimeWarning: divide by zero encountered in scalar divide
1/0 inf
<string>:1: RuntimeWarning: invalid value encountered in scalar divide
0/0 nan
Though, it appears when the -nan
and nan
values get passed into numpy, the sign is stripped off, so these compilers essentially implement the same thing.
I think what I'm suggesting is that the code explicitly return NaN values instead of relying on the compiler.
However, I think that the rest of the stats C code does division without explicitly returning NaN values, so I'm happy to just do that.
That's interesting - I'm pretty sure IEEE 754 defines division by zero behaviour, so I thought it was the hardware that decided this rather than compiler. I do remember having some different results on old 32 bit hardware without a proper FPU.
We fretted quite a bit about this IIRC and did try to avoid division by zeros initially. However, it turns out to be surprisingly tricky to avoid completely and you end up with quite subtle and complicated numerical code around the special cases. So, the cure is worse than the disease here I think - in practise the majority of platforms will return NaN as required, and we have less code to maintain.
When we encounter situations in $r$, $r^2$, and $D^{\prime}$ where we have $\frac{0}{0}$, should we return $0$ or NaN? @apragsdale brings up a good point: What is the limit behavior as we approach $\frac{0}{0}$?
Let's follow up on this to ensure correctness of the API.