markrogoyski / math-php

Powerful modern math library for PHP: Features descriptive statistics and regressions; Continuous and discrete probability distributions; Linear algebra with matrices and vectors, Numerical analysis; special mathematical functions; Algebra
MIT License
2.32k stars 238 forks source link

Implement Eigenvalues and EigenVectors #196

Open hoytluo opened 7 years ago

hoytluo commented 7 years ago

can you add a function to calculate the eigen vectors and eigen values of a matrix?

markrogoyski commented 7 years ago

Hi, Thank you for your interest in MathPHP!

I plan to add eigenvalues and eigenvectors. I'll update this issue thread when there is progress.

Thanks.

Beakerboy commented 7 years ago

I started looking into some of this some time ago. I have a function that works for a 2 x 2 matrix. One thing to note is how we want to define the eigenvector. Matlab uses a vector with a length of the square root of two. I don't know what other math applications do.

I was going to add the ability to surround a set of data with an ellipse with a given confidence interval. This would require an eigenvector to set the orientation of the ellipse.

markrogoyski commented 7 years ago

If you have something working, even if it is not quite ready, and would like feedback/someone to try it out, don't hesitate to do a pull request to a feature branch, or just post a link to what you are working on.

Beakerboy commented 7 years ago

I'll list the pieces I have for the ellipse, and let me know where you would like them put. I was thinking covarienceMatrix, eigenvalue, and eigenvector would be in MathPHP\LinearAlgebra\Matrix ellipse in MathPHP\Statistics\Correlation unitCircle in MathPHP\Functions\Support

/**
 * Given the data in $X and $Y, create an ellipse
 * surrounding the data at a $sigma confidence interval.
 *
 * The function will return $num_points pairs of X,Y data
 * http://stackoverflow.com/questions/3417028/ellipse-around-the-data-in-matlab
 */
function ellipse(array $X, array $Y, float $sigma, int $num_points = 11): array
{
}

/**
 * Given the data in $X and $Y, calculate the covarience matrix
 *
 * https://en.wikipedia.org/wiki/Covariance_matrix
 */
function CovarienceMatrix(array $X, array $Y): SquareMatrix
{
}

/**
 * Calculate the eigenvalues of a square matrix
 *
 * https://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors
 */
function Eigenvalues(SquareMatrix $M): DiagonalMatrix
{
}

/**
 * Calculate the eigenvectors of a square matrix
 * The returned eigenvectors have a magnitude of sqrt(2)
 *
 * https://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors
 */
function Eigenvectors(SquareMatrix $M): SquareMatrix
{
}

/**
 * returns $numpoints number of points, equally spaced around a unit circle
 * The point y=0 is returned both as the first an last element to ease plotting.
 * 
 */
function UnitCircle(int $num_points): array
{
}
Beakerboy commented 7 years ago

My work in progress https://github.com/Beakerboy/math-php/blob/Ellipse/src/Statistics/Correlation.php

hoytluo commented 7 years ago

First of all thank you for your reply.

If it's possible that the eigenvectors and eigenvalues are getted by the same function just like the LUDecomposition function.

markrogoyski commented 7 years ago

@Beakerboy Thanks for sharing your work. Good stuff.

What do you think about putting the unit circle in a new namespace like Trigonometry or something?

Beakerboy commented 7 years ago

I just pushed a eigenvalues function. I was looking at the way matlab handles eigenvalues and vectors. Their eig() function can return results in a few different ways.

You can get a column of eigenvalues with:
V = eig(M);   // [1]
                         [2]

A diagonal matrix of values AND a matrix of vectors:
[V,D] = eig(M)
 V = [1 0]
        [0 2]
D = [3 5]
       [-5 3]

The thing is, in order to calculate the eigenvectors, you need to canculate the eigenvalues., so if someone wants both and they call

$foo = $M->eigenvalues();
$bar = $M->eigenvectors();

The eigenvalues function will be called twice. Should I make the eigenvector method return a list?

list($values, $vectors) = $M->eigenvectors(true);

The 'true' would indicate that the user wants both the eigenvalues and the eigenvectors.

markrogoyski commented 7 years ago

My preference would be that eigenvalues and eivenvectors are separate functions that return separate things.

Eigenvalues can lazy load the values. That way, when eigenvectors is called, if eigenvalues has already been called, it will not need to recompute anything. Have the eigenvalues set a class attribute $this->eigenvalues before returning. And at the beginning of the method, return that class attribute if it is set.

$values  = $M->eigenvalues();  // Computes eigenvalues and sets class attribute
$vectors = $m->eigenvectors(); // Calls eigenvalues but eigenvalues simply returns the precomputed values.

You can look at the Matrix det() and rref() functions to see this strategy being used.

Beakerboy commented 7 years ago

@hoytluo I just submitted a pull request for an eigenvalue calculation. This works on 2x2 and 3x3 matricies, and can be extended to 4 pretty easily.

wizard97 commented 7 years ago

Definitely, needs to be a more general numerical algorithm to handle any NxN case. I think the next relevant discussion is what is a good numerical algorithm for approximating EVs for large matrices. As soon as that is implemented, implementing a much needed SVD method would be trivial.

markrogoyski commented 7 years ago

I imagine there will end up being multiple NxN algorithms implemented, perhaps with an appropriate method being chosen automatically based on the characteristics of the matrix if the user does not specify a method.

In choosing which algorithms to implement, is it of any value to implement algorithms such as Power Iteration which can only produce a single eigenvalue? Or perhaps that is a different function, such as eigenpair() which would indicate you are only getting a single eigenvalue and vector.

markrogoyski commented 7 years ago

Any thoughts on what the default return value should be from Matrix->eigenvalues()? Two options spring to mind:

The DiagonalMatrix would still allow access to an array of values by doing $M->getDiagonalElements() That seems more flexible, but it may not be what a user expects?

Thoughts?

wizard97 commented 7 years ago

Certainly cases where people only care about the dominant EV, but I don't think an eigenpair() function would be a priority. Maybe later as an optimization, but I think the priority is implementing an efficient eigenvalue and eigenvector function that can be quickly used in a bunch of missing functions (such as eigenpair() SVD etc..). Presumably Matrix->eigenvalues() would return the values from largest to smallest, so perhaps temporarily eigenpair() can just use eigenvalues() then return the first element. Later someone can work on a more efficient implementation for eignpair() such as the power method.

Any thoughts on what the default return value should be from Matrix->eigenvalues()?

Definitely should be an 1d array or vector, it is unintuitive to expect a function that computes eigenvalues of a matrix to also return a matrix. Furthermore, it is a sparse matrix and a waste of space if you don't need it as a matrix anyways. It would be easy anyways to convert a 1d vector into a matrix by simply multiplying the 1d vector by eye(N). Perhaps there should be a function to do this?

markrogoyski commented 7 years ago

If you send a one-dimensional array of values to MatrixFactory::create you will get a DiagonalMatrix, so that functionality already exists.

Beakerboy commented 7 years ago

From my research, it looks like the QR decomposition of a matrix is a popular means to get eigenvalues of larger matricies. Since A = QR, it seems like it is common to find Q, and then calculate R = QᵀA since QQᵀ = I. Should Should Matrix::qrDecomposition(Matrix $A) return an array ['Q' => Matrix $Q, 'R' => Matrix $R] or just Q, or Qᵀ. Matlab returns just R as the default, or [Q,R].

markrogoyski commented 7 years ago

For LU Decomposition, the return value is array containing [L, U, P, A]. I don't know why I put A in there, but L U P is what I think should be there. So fro QR decomposition, I'd expect [Q, R] in the return value. Are you going to work on QR decomposition? Just let me know so we don't duplicate work as QR decomposition is something in the backlog.

andrewdalpino commented 6 years ago

Hi @markrogoyski @Beakerboy

Are there plans for implementing QR still?

It would be SUPER helpful to me and I'm sure others in the case where I need eigenvalues and eigenvectors to do dimensionality reduction on datasets i.e. PCA and LDA.

I'm no linear algebra whiz, so this is probably something I couldn't handle alone

markrogoyski commented 6 years ago

Hi @andrewdalpino,

Yes, the QR decomposition is on the roadmap. I will prioritize it since multiple people have now requested this feature.

Thanks for your interest in MathPHP and providing your feedback.

Beakerboy commented 5 years ago

I have to say I'm not a fan of having separate eigenvalue and eigenvector classes. I propose the following: Have a singular "Eigen" class. This class houses all the value / vector algorithms.

class Eigen
{
    /**
     * Eigenvalues
     *
     * row array?
     * Should it be sorted?
     *   -Absolute or relative value?
     *   -High to low or low to high?
     */
    protected $EVal;

     /**
     * Eigenvectors
     *
     * sorted the same order as $Eval
     *
     * Column vectors?
     * Unit Vectors?
     * Top element positive?
     */
    protected $EVec;

    public function __construct($Matrix $M, string $method = Eigen::QR_HOUSEHOLDER)
    {
        $this->M = $M;
        $this->qrHouseholder();
    }

    private function qrHouseholder()
    {
        // Stuff
        $this->Eval = $R->multiply($Q)->getDiagonalElements();
    }

    public function getEigenvalues(): array (or matrix?)
    {
        return $this->EVal;
    }
    /**
     * In case a user does not want unit eigenvectors
     */
    public function getEigenvectors($scale = 1): array (or matrix?)
    {
        return $this->EVal->scalarMultiply(sqrt($scale));
    }
}

The documentation for R has the following note in prcomp (The PCA function)

Note The signs of the columns of the rotation matrix are arbitrary, and so may differ between different programs for PCA, and even between different builds of R.

The rotation matrix is a matrix of unit column eigenvectors, so they must not have a sign specification.

The matrix class can store this object if called. If set, it can return the appropriate item without having to recalculate. Since some of the eigenvalue/vector algorithms calculate both, it's a waste to call them twice from two separate classes, or to call an algorithm than calculates both, and then recalculate the vectors from a generic solving technique. This is what the JK algorithm would have to do in the existing class structure.

Is there a reason for having them separate? Do you expect users to call them directly, or will the Matrix class wrap the Eigen function up?