Consider adding 8 math functions to std.math for statistics applications

dirkschumacher commented 3 years ago

Evan Miller (@evanmiller) made a great case that adding just a few more math functions to the standard <math.h> C library would enable a huge range of statistical applications. Or to quote him:

Basic statistical analysis requires special mathematical functions which almost no standard math libraries implement. These functions are not particular to statistics, but they form building blocks for a number of important statistical applications. If standard math libraries were to provide just a few more special functions, including incomplete beta, incomplete gamma, inverse normal, and better Bessel functions, developers would be able to integrate more statistical functions into their applications without having to license specialized libraries, or copy-paste code of dubious origin into their projects. (https://www.evanmiller.org/statistical-shortcomings-in-standard-math-libraries.html)

Since zig provides a rather comprehensive standard library (e.g. all the great stuff in std.crypto) it might be a good idea to add those functions to the standard math library, such that anybody can build great statistical/machine learning work in zig without resorting to third party math libraries.

Just an idea. The linked article provides the full rational, but I will list the 8 functions here for completneness:

I realize that implementing all of these specialized numerical functions sounds like a lot of work, but in fact the work is essentially finished: the free Cephes library contains a host of mathematical functions with names and argument orders nearly identical to their counterparts, as well as all of the functions I have mentioned above. So my proposal boils down to “graduating” the following eight double-precision functions, along with their single-precision and extended precision counterparts, from Cephes into the standard C library:

/* Regularized incomplete beta function */
double incbet(double a, double b, double x);

/* Inverse of incomplete beta integral */
double incbi(double a, double b, double y);

/* Regularized incomplete gamma integral */
double igam(double a, double x);

/* Complemented incomplete gamma integral */
double igamc(double a, double x);

/* Inverse of complemented incomplete gamma integral
double igami(double a, double p);

/* Normal distribution function */
double ndtr(double x);

/* Inverse of Normal distribution function */
double ndtri(double y);

/* Bessel function of non-integer order */
double jv(double v, double x);

rohlem commented 3 years ago

In support, though I'd favour more readable names, even if these might be standard in the field, ndtr and jv aren't obvious. Especially since const xyz = makes it trivial for end users to implement their preferred naming scheme.

daurnimator commented 3 years ago

@dirkschumacher is this something you'd implement?

marler8997 commented 3 years ago

In support, though I'd favour more readable names...

You could also have both.

std/math/sciencynames.zig:

pub const incbet = std.math.regularIncompleteBeta;
pub const incbi = std.math.inverseIncompleteBeta;
//...

dirkschumacher commented 3 years ago

I'd favour more readable names

Yes absolutely.

@dirkschumacher is this something you'd implement?

Yeah, but I cannot commit to anything this year. Will try next year. So if anyone wants to go ahead in the meantime, feel free.

tiehuis commented 3 years ago

Done an initial implementation in the above PR. Marked a draft since I want to do some more testing more but it is largely complete.

There are also a few questions regarding exposed names, etc, which I haven't yet changed. I've pushed now just to indicate there is some more work here.

tiehuis commented 3 years ago

The original issue/post specifies the following functions:

/* Regularized incomplete beta function */
double incbet(double a, double b, double x);

/* Regularized incomplete gamma integral */
double igam(double a, double x);

Note these are listed as regularized functions. Cephes doesn't actually implement these as the listed function definitions. These are documented as just being the incomplete integrals.

The regularized functions are fairly straight-forward to derive from these, however. Specifically see the definitions here:

It sounds like the case is made for these regularized functions in addition to the incomplete ones but it is a little vague.

dirkschumacher commented 3 years ago

True, it is a bit vague. In his essay, @evanmiller says:

1-2. The regularized incomplete beta function, Ix(a, b), and its inverse

So I believe $I_x(a,b)$ is what he argues for and that would need to be constructed from the listed functions (the integrals) as you said. So I guess one could also supply the regularized incomplete beta function and its inverse or let the user implement that if needed. Not sure. I think that also depends on the strategy of std.math (only basic building blocks or also high level functions which could however also be a separate package).

E.g.

RegularizedIncompleteBeta(a, b, x) = incbet(a, b, x) / incbet(a, b, 1)
InverseRegularizedIncompleteBeta(a, b, y) = incbi(a, b, y * incbet(a, b, 1))

andrewrk commented 9 months ago

Accepted, along with any related suggestions or modifications by @tiehuis

ziglang / zig

Consider adding 8 math functions to std.math for statistics applications #7212