hoaproject / Math

The Hoa\Math library.
https://hoa-project.net/
366 stars 36 forks source link

Mathematical Functions #35

Open dantleech opened 8 years ago

dantleech commented 8 years ago

Currently this library includes no mathematical functions. For PHPBench I needed access to basic statistic functions, e.g. average, stddev, variance, etc. and later I needed to port a Kernel Distribution estimator class, it would be great to move these over to this library.

Reference libraries / projects:

I would like to start with porting the minimum required for PHPBench:

Functions or static methods?

Should we use functions or static methods? i.e.

$mean = Stats::mean(Core::linspace(1, 10, 10));

or

$mean = Stats\mean(Core\linspace(1, 10, 10));

I tend to prefer the functional approach, but it is not possible to "autoload" functions, so we would have to include the whole library everytime.

{
    "autoload": {
        "files": [
            "lib/core.php",
            "lib/stats.php",
            "lib/financial.php",
        ]
    }
}

Perhaps the overhead would be insignificant, but maybe it would be better to use static methods anyway, thoughts?

Package Organisation

Currently the source code for this package is located in the root directory, along with some "non-code" files. Which I am not sure is a great idea, f.e. the Bin directory, which (I think) this case stands for "binary" but this also has a mathematical significance. Which could block us in the future.

Equally the Context and Arithmetic.pp files share the same concern as the Bin\Calc.php, so I would rather these were all in a sub-namespace:

Parser/ // or whatever this could be named
   Calc.php
   Arithimetic.php
   Context.php
Combinatronics/
Sampler/
Stats/
    Kde.php
    GaussianBlah.php
    GammaBlah.php
    Stats.php // file containing functions (if that is what we do)
Financial/
LinearAlgebra/
Etc../

Options or Arguments?

PHP does not currently have named parameters, which really sucks.

Taking an example from R:

matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE,
       dimnames = NULL)

So we have the choice of either:

function matrix($data = null, $nrow = 1, $ncol = 1, $byrow = false, $dimnames = null)

or

function matrix($data, array $options = array())
{
    $options = array_merge([
        'nrow' => 1,
        'ncol' => 1,
        'byrow' => false,
        'dimnames' => null
    ], $options);
}

The first is pretty bad IMO, and the second really requires a utility to validate the options. In this case a class would probably be a better solution, but this sort of thing is quite frequent, so it would be good to have a stragtegy to deal with it.

Vectorized Operations

R and numpy support vectorized operations, e.g.

$x  = array(1, 2, 3);
$y = array(2,3,4);
$x + $y; // array(3, 5, 7)

I think it would be good to support this:

Vector::add($x, $y); // array(3, 5, 7)
Vector::multiple($x, $y);
Vector::div($x, $y); 
// etc
--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/29918722-mathematical-functions?utm_campaign=plugin&utm_content=tracker%2F894946&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F894946&utm_medium=issues&utm_source=github).
Hywan commented 8 years ago

Hello :-),

So first, functions vs. static classes… I would tend to say that static classes are better for the current state of PHP. But it can change in the future and it will introduce a BC break if we embrace new features. So I am not sure what to do. I guess we must decide based on testing. Is it easier to test functions or classes? Static classes are not easy to test, so functions sound better. But, as you did mention, the autoloading of functions is a nightmare. We should not autoload them with Composer, it would require too much (probably useless) code. If we have a Vector, let's assume we create an object and then have something like: (new Vector(…))->add(new Vector())… etc. We avoid static classes when multiple methods share the same data. Actually, I am not very clear. Static classse are hard to test when we must mock. If we avoid the needs to mock, then it's fine.

About library structure. In Combinatorics, I have the following (empty) directories (not commit): Arrangement, Counting, FiniteSet and Permutation, in addition to Combination. We have different kind of possible organisation for Mathematics, but I would avoid “Financial” or any business vocabulary like this one. However, I don't know how “Mathematics are organized”. We must do some research here.

About options or arguments? Options. Definitively. I hate options. It's hard to validate, it's hard to debug for the user, it adds some magic… If we have too much arguments (like byrow in your example), we must enforce some conventions. It's not a big deal for the user but it eases our work and avoids bugs.

Thoughts?

dantleech commented 8 years ago

Static classse are hard to test when we must mock.

Functions / static methods would be things like stdev($values), variance($values), covar($set1, $set2), etc. so no need to mock there.

For things like distributions we would have classes I think and as you say, Vectors (and the like) would be classes we can add methods (multiply, add, etc) - which would call the Static methods defined elsewhere with their data as an argument/s.

I would avoid “Financial” or any business vocabulary like this one. However, I don't know how “Mathematics are organized”. We must do some research here.

I just took Financial from the Numpy library I think, so yeah - I think just doing a quick survey of existing libraries would be sufficient to determine our organization.

About options or arguments? Options. Definitively. I hate options.

:) You mean we should use arguments? I tend to agree, my main concern is when you have to skip over several NULL arguments to set an option that you want.

PHP will take NULL as a NULL and not use the default, meaning the user has to explicitly set each of the "default" values before getting to the one they need.

I think we could possibly avoid this by just minimizing such cases by having different variations of the function where absolutely necessary, or possibly by using bitwise constants. I guess we can play it by ear.

Hywan commented 8 years ago

:+1: for bitwises and constants. Sound better.

We can use static classes as a “collection of functions”. It sounds stupid like this because this is the definition of a class basically :-D, but it has the same role as namespace.

Also, about the naming, Hoa prefers long names. So stdev is wrong, but standardDeviation is good. Yes it's longer. But it's clearer.

Thoughts?

dantleech commented 8 years ago

Actually another thing just occurred to me - I have been using these functions in Python and R exclusively from the interactive shell, and I have also been using psysh for testing PHP math stuff.

Using these things interactively is an argument both for shorter names and for using "proper"/standalone functions (R uses sd for standard deviation, Numpy std -- iterestingly Apache Math uses a class.

Hmm.. and then we also have the PHP stats extension, it could make sense to make a polyfill for this so that when that extension is enabled it can be used directly.

dantleech commented 8 years ago

btw, PHP stats uses the long name for stdev, stats_standard_deviation.

Jir4 commented 8 years ago

Please label this issue @Hywan :s

Hywan commented 8 years ago

@dantleech Long names are always better, even if this is not easy to write, it remains easy to read and it avoids a lot of confusion or name clashings.

Jir4 commented 8 years ago

:+1: for long names.

Hywan commented 8 years ago

ping?