markrogoyski / math-php

Powerful modern math library for PHP: Features descriptive statistics and regressions; Continuous and discrete probability distributions; Linear algebra with matrices and vectors, Numerical analysis; special mathematical functions; Algebra
MIT License
2.34k stars 240 forks source link

Preparing the matrix class for sparse matricies #202

Open Beakerboy opened 7 years ago

Beakerboy commented 7 years ago

I was thinking about the future of the DiagonalMatix and other Sparse Matricies. Would it make sense to change the matrix algebra functions so that, instead of referencing the self::$A array directly, we instead pass all calls through self::get(n,m)? This would allow us to store a diagonalMatrix in memory, just with the diagonal as an array, and the get() function in the diagonal class could be programmed to return 0 if n != m, and $a[n) otherwise. If we implement a generic SparseMatrix class, we will likely need similar tricks since the $A array of arrays will not exist.

markrogoyski commented 7 years ago

That idea makes sense. I haven't really put any thought into memory optimizations.

Before considering or implementing these kinds of architecture optimizations, I think the current focus of the project is:

  1. Correctness
  2. Feature completeness
  3. Clean, well-designed code

Once we have most of the features we want cleanly implemented with a stable public API, and the unit tests (100% code coverage) give us confidence that it is correct, then I think it makes the most sense to spend the effort to optimize. With all the unit tests validating your improvements, you can safely refactor and focus on the optimization, knowing that you have not caused a regression or introduced a bug.

When it comes to optimizations, there are two main kinds to consider:

  1. Algorithm improvements
  2. Code/architecture/design improvements

For (1) algorithm improvements, these are straight forward. For example, if something is currently implemented with a O(n³) algorithm, and there is a O(n log n) algorithm that exists, that doesn't require any discussion or profiling. The performance benefit you get probably outweighs whatever code complexity you introduce to achieve it. (Your optimization to the median function is a good example of this).

For (2) code/architecture/design optimizations, I would want to make sure that any complexity that is introduced to achieve the performance improvement is worth the effort. In software it is generally the case that 80% of the time is spent in 20% of the code (80/20 rule). So you'd want to profile and see what the impact of the change is. Making an optimization to the 20% part that affects 80% of the time is a significant benefit--making an optimization to the other 80% part that is only 20% of the runtime may not realize any noticeable change (known as a micro-optimization). So you'd want to profile and optimize where it makes sense. Optimization usually means more complicated code, so you want to make sure the tradeoff is worth it.

I'm not saying don't think about optimizing (please do!). The Matrix classes are still being developed and built out, so I think it is prudent to discuss what are the features we would want to consider it feature complete for Version 1, and once that is achieved with 100% test coverage, then identify the most meaningful optimizations and come up with a design and implement them. And calling the get() method in place of the $A array may be one of those we consider and implement at that time.

TheFausap commented 7 years ago

An idea about sparse matrices... usual strategy : not storing the zero entries. But if I have a matrix with a lot of '1', for example, instead of zero, we could create a sparse matrix where the unwritten elements should be 1. So maybe create a generic idea of sparse matrix.

I'm writing this because this could be very userful for Quantum Fourier Transform.