markrogoyski / math-php

Powerful modern math library for PHP: Features descriptive statistics and regressions; Continuous and discrete probability distributions; Linear algebra with matrices and vectors, Numerical analysis; special mathematical functions; Algebra
MIT License
2.33k stars 240 forks source link

Expanded use of traits? #75

Closed Beakerboy closed 8 years ago

Beakerboy commented 8 years ago

The following sentence popped into my head this weekend, "a regression is the combination of a model and a method".

I was thinking we could create "Models" as objects. Models are functions, but more. A model would necessarily have an evaluateModel() method.

// y=mx+b
trait LinearModel;
{
    function evaluateModel(...$params)
    {
        $x = $params[0];
        $m = $params[1];
        $b = $params[2];
        return $m * $x + $b;
    }
    // It can also include other "model specific" functions.
    public function getModelEquation(...$params): string
    {
        $x = $params[0];
        $m = $params[1];
        $b = $params[2];
        return sprintf('y = %fx + %f', $m, $b);
    }

    // Include other stuff as desired. partial derivatives would be handy for Jacobians.
    function partialDerivatives($x, $m, $b, $parameter)
    {
        switch($parameter){
            case 1 : return $m;
                  break;
            case 2 : return $x;
                  break;
            case 3 : return 1;
                  break;
        }
    }
}

class Regression {
     // An array of our regression parameters. Use this instead of $this->m and $this->b.
     protected $params

     // We can then move the evaluate and getEquation code up to the parent.
     // The specific details on how to do this are a little fuzzy, arrays, list of parameters...
     public function evaluate($x)
    {
        // Params is an array of parameters, if the chosen method produces parameters

        $fitted_params = $this->params
        return evaluateModel($x, $fitted_params);
    }

 public function getEquation($x)
    {
        // Params is an array of parameters, if the chosen method produces parameters
        $fitted_params = $this->params
        return getModelEquation($x, $fitted_params);
    }
}

// The Linear class then combines a Linear Model with a Least Squares method.
// We could alternatively combine a Logarithm model with LeastSquares, or Exponential with Interpolation.
class Linear extends Regression
{
    use LinearModel;
    use LeastSquares;

    function calculate($ys, $xs)
    {
        // Prepare the data for the chosen method.
    }

    // If we have a non-parametric regression, we will but evaluate code here.
    // LOESS or interpolation would be two examples.
    function evaluate($x)
    {

    }
}

We have the existing LeastSquares trait as a method, but we could also use weighted least squares, or non-linear, or LOESS, or whatever.

The job of the regression class is to provide common functions which link a regression method and a model, such as to allow us to arbitrarily evaluate the model at defined points. I'm not saying that the job is to find any sort of universal parameters, because non-parametric regression has no universal parameters. The classes which extend the Regression class are where the model and the method are chosen. Data is prepared for the analysis, and, in the case of non-parametric regressions, functions (like evaluate) may have to be overridden.

I guess "Regression" could be extended to ParametricRegression, and NonParametricRegression...

markrogoyski commented 8 years ago

Cool. I think I get it.

My one concern is that since it would be using an ordered numerical array, it is easy to mess up parameters and it isn't immediately clear what is going on. So I'd want LinearModel etc. to have parameter constants:

const X = 0; // x parameter index
const M = 1; // m parameter index
const B = 2; // b parameter index

Something like that, so then you can do:

    function evaluateModel(...$params)
    {
        $x = $params[self::X];
        $m = $params[self::M];
        $b = $params[self::B];
        return $m * $x + $b;
    }

Basically, outside of a loop, if you are indexing into a numerical array, you probably want to do it with named meaningful constants to increase code clarity. Thanks.

Beakerboy commented 8 years ago

I agree with the naming. However, in the case linear or polynomial, the evaluate function is easily calculated using:

// $order = 1 for linear, 2 for y=m1 * x²+m2 * x +b, 3 for x^3, etc.
return Vandermonde($x, $order + 1)->multiply($betas)[0][0];
Beakerboy commented 8 years ago

I have something in the works in my repository, but I'm seeing errors like the following:

1) Math\Statistics\Regression\HanesWoolfTest::testGetParameters with data set #0 (array(array(0.037999999999999999, 0.050000000000000003), array(0.19400000000000001, 0.127), array(0.42499999999999999, 0.094), array(0.626, 0.2122), array(1.2529999999999999, 0.27289999999999998), array(2.5, 0.26650000000000001), array(3.7400000000000002, 0.33169999999999999)), 0.36151233700000002, 0.55417895500000003) Error: Call to undefined method Math\Statistics\Regression\Regression::getModelParameters() /home/travis/build/Beakerboy/math-php/src/Statistics/Regression/Regression.php:83 /home/travis/build/Beakerboy/math-php/tests/Statistics/Regression/HanesWoolfTest.php:13

Any ideas? Is there "one simple trick" to getting static methods to work in traits?

markrogoyski commented 8 years ago

Static trait methods should work. Since you are inheriting from a parent Regression class, and that is making the call, but that doesn't have the trait, then that could be the cause of the method not being defined. I think if you include the use trait in the parent Regression class it will work, but then that kind of defeats the point of what you are trying to do. The issue seems to be whether a parent method can use a child's trait, which might be no, but I'm not 100% sure.

Beakerboy commented 8 years ago

Maybe using get_called_class() (or whatever that function is called) will work.

Beakerboy commented 8 years ago

I've submitted a pull request. Lots has changed, but going forward I think this will make things more flexible and easier to extend.