Closed davidjr82 closed 2 years ago
Hi @davidjr82,
Thank you for your interest in MathPHP.
Quartiles, unfortunately, do not have a single standard way to compute them. In R for instance, there are nine different variations. Excel has two. The Wikipedia article shows four. MathPHP's documentation for quartilesInclusive
indicates it uses the "Tukey's hinges" quartile method, which is "method 2" in the Wikipedia article.
Method 2 Use the median to divide the ordered data set into two-halves. If there are an odd number of data points in the original ordered data set, include the median (the central value in the ordered list) in both halves. If there are an even number of data points in the original ordered data set, split this data set exactly in half. The lower quartile value is the median of the lower half of the data. The upper quartile value is the median of the upper half of the data. The values found by this method are also known as "Tukey's hinges";[4] see also midhinge.
Using your dataset and computing the Wikipedia method 2 by hand.
[0,900,1800,2700,3600,4500]
Use the median to divide the ordered data set into two-halves.
There is an even number of numbers, so the median is the average of 1800 and 2700 which is 2250.
If there are an even number of data points in the original ordered data set, split this data set exactly in half.
Lower half = [0, 900, 1800]
Upper half = [2700, 3600, 4500]
The lower quartile value is the median of the lower half of the data
The median of [0, 900, 1800]
is 900.
The upper quartile value is the median of the upper half of the data.
The median of [2700, 3600, 4500]
is 3600.
This matches the result MathPHP provides.
Also for reference, there are multiple quartile methods in R which give the same result:
> quantile(c(0, 900, 1800, 2700, 3600, 4500), type=2)
0% 25% 50% 75% 100%
0 900 2250 3600 4500
> quantile(c(0, 900, 1800, 2700, 3600, 4500), type=5)
0% 25% 50% 75% 100%
0 900 2250 3600 4500
Keep in mind there is also a Descriptive::percentile
function you can use which has a more "standard" definition if that is what you are looking for.
Descriptive::percentile([0,900,1800,2700,3600,4500], 25) // 1125
Having this array:
And having, therefore, these percentiles (inclusive, range 0..1):
If I ask for the quartiles, I expect the first quartile (just as an example) to be between 900 and 1800 (0.25 should be 1125), but it is 900.
Q1 should have the same value as the percentile 25th, but it has the value of the 20th percentile (Wikipedia Quartile definition)
Is this a bug, or there is something I am missing?
Thanks!