markrogoyski / math-php

Powerful modern math library for PHP: Features descriptive statistics and regressions; Continuous and discrete probability distributions; Linear algebra with matrices and vectors, Numerical analysis; special mathematical functions; Algebra
MIT License
2.32k stars 238 forks source link

Handling Missing values #416

Closed saurabhgayali closed 3 years ago

saurabhgayali commented 3 years ago

It would be nice if we can handle missing values. At least option to ignore to replace with number.

markrogoyski commented 3 years ago

Hi,

Thanks for your interest in MathPHP.

Can you be more specific and provide some examples of what you are talking about?

saurabhgayali commented 3 years ago

My Data has missing points and so I am getting following errors on calculating quartiles.

Fatal error: Uncaught TypeError: MathPHP\Statistics\Average::kthSmallest(): Return value must be of type float, null returned in \vendor\markrogoyski\math-php\src\Statistics\Average.php:158 Stack trace: #0 \vendor\markrogoyski\math-php\src\Statistics\Average.php(103): MathPHP\Statistics\Average::kthSmallest(Array, 2) #1 \vendor\markrogoyski\math-php\src\Statistics\Average.php(165): MathPHP\Statistics\Average::median(Array) #2 \vendor\markrogoyski\math-php\src\Statistics\Average.php(108): MathPHP\Statistics\Average::kthSmallest(Array, 2384) #3 \vendor\markrogoyski\math-php\src\Statistics\Descriptive.php(419): MathPHP\Statistics\Average::median(Array) #4 \vendor\markrogoyski\math-php\src\Statistics\Descriptive.php(355): MathPHP\Statistics\Descriptive::quartilesExclusive(Array) #5 \tempmaths.php(41): MathPHP\Statistics\Descriptive::quartiles(Array) #6 {main} thrown in \vendor\markrogoyski\math-php\src\Statistics\Average.php on line 158

Using function Descriptive::quartiles($myarray);

markrogoyski commented 3 years ago

Hi @saurabhgayali,

I'm not sure what you mean by "missing points" since you are not specific, but presuming you mean null values, you can easily filter your data before calling any functions.

For example:

php > $list = [1, 2, 3, null, 5, 6, null, 8, 9, 10];
php > $filteredList = array_filter($list, function (?int $item): int { return is_int($item); } );
php > print_r($filteredList);
Array
(
    [0] => 1
    [1] => 2
    [2] => 3
    [4] => 5
    [5] => 6
    [7] => 8
    [8] => 9
    [9] => 10
)

Keep in mind this is not a data science library, it is a math library. Part of data science is preparing data to do mathematical analysis on it.