Open dantleech opened 8 years ago
Hello :-),
So first, functions vs. static classes… I would tend to say that static classes are better for the current state of PHP. But it can change in the future and it will introduce a BC break if we embrace new features. So I am not sure what to do. I guess we must decide based on testing. Is it easier to test functions or classes? Static classes are not easy to test, so functions sound better. But, as you did mention, the autoloading of functions is a nightmare. We should not autoload them with Composer, it would require too much (probably useless) code. If we have a Vector
, let's assume we create an object and then have something like: (new Vector(…))->add(new Vector())…
etc. We avoid static classes when multiple methods share the same data. Actually, I am not very clear. Static classse are hard to test when we must mock. If we avoid the needs to mock, then it's fine.
About library structure. In Combinatorics
, I have the following (empty) directories (not commit): Arrangement
, Counting
, FiniteSet
and Permutation
, in addition to Combination
. We have different kind of possible organisation for Mathematics, but I would avoid “Financial” or any business vocabulary like this one. However, I don't know how “Mathematics are organized”. We must do some research here.
About options or arguments? Options. Definitively. I hate options. It's hard to validate, it's hard to debug for the user, it adds some magic… If we have too much arguments (like byrow
in your example), we must enforce some conventions. It's not a big deal for the user but it eases our work and avoids bugs.
Thoughts?
Static classse are hard to test when we must mock.
Functions / static methods would be things like stdev($values)
, variance($values)
, covar($set1, $set2)
, etc. so no need to mock there.
For things like distributions we would have classes I think and as you say, Vectors (and the like) would be classes we can add methods (multiply
, add
, etc) - which would call the Static methods defined elsewhere with their data as an argument/s.
I would avoid “Financial” or any business vocabulary like this one. However, I don't know how “Mathematics are organized”. We must do some research here.
I just took Financial
from the Numpy library I think, so yeah - I think just doing a quick survey of existing libraries would be sufficient to determine our organization.
About options or arguments? Options. Definitively. I hate options.
:) You mean we should use arguments? I tend to agree, my main concern is when you have to skip over several NULL arguments to set an option that you want.
PHP will take NULL as a NULL and not use the default, meaning the user has to explicitly set each of the "default" values before getting to the one they need.
I think we could possibly avoid this by just minimizing such cases by having different variations of the function where absolutely necessary, or possibly by using bitwise constants. I guess we can play it by ear.
:+1: for bitwises and constants. Sound better.
We can use static classes as a “collection of functions”. It sounds stupid like this because this is the definition of a class basically :-D, but it has the same role as namespace.
Also, about the naming, Hoa prefers long names. So stdev
is wrong, but standardDeviation
is good. Yes it's longer. But it's clearer.
Thoughts?
Actually another thing just occurred to me - I have been using these functions in Python and R exclusively from the interactive shell, and I have also been using psysh for testing PHP math stuff.
Using these things interactively is an argument both for shorter names and for using "proper"/standalone functions (R uses sd
for standard deviation, Numpy std
-- iterestingly Apache Math uses a class.
Hmm.. and then we also have the PHP stats extension, it could make sense to make a polyfill for this so that when that extension is enabled it can be used directly.
btw, PHP stats uses the long name for stdev, stats_standard_deviation
.
Please label this issue @Hywan :s
@dantleech Long names are always better, even if this is not easy to write, it remains easy to read and it avoids a lot of confusion or name clashings.
:+1: for long names.
ping?
Currently this library includes no mathematical functions. For PHPBench I needed access to basic statistic functions, e.g.
average
,stddev
,variance
, etc. and later I needed to port a Kernel Distribution estimator class, it would be great to move these over to this library.Reference libraries / projects:
I would like to start with porting the minimum required for PHPBench:
Functions or static methods?
Should we use functions or static methods? i.e.
or
I tend to prefer the functional approach, but it is not possible to "autoload" functions, so we would have to include the whole library everytime.
Perhaps the overhead would be insignificant, but maybe it would be better to use static methods anyway, thoughts?
Package Organisation
Currently the source code for this package is located in the root directory, along with some "non-code" files. Which I am not sure is a great idea, f.e. the
Bin
directory, which (I think) this case stands for "binary" but this also has a mathematical significance. Which could block us in the future.Equally the
Context
andArithmetic.pp
files share the same concern as theBin\Calc.php
, so I would rather these were all in a sub-namespace:Options or Arguments?
PHP does not currently have named parameters, which really sucks.
Taking an example from R:
So we have the choice of either:
or
The first is pretty bad IMO, and the second really requires a utility to validate the options. In this case a class would probably be a better solution, but this sort of thing is quite frequent, so it would be good to have a stragtegy to deal with it.
Vectorized Operations
R and numpy support vectorized operations, e.g.
I think it would be good to support this: