root-project / root

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
https://root.cern
Other
2.7k stars 1.28k forks source link

Problems with `TH1::GetQuantiles` #12251

Closed lmoneta closed 5 days ago

lmoneta commented 1 year ago

The function TH1::GetQuantile is not returning in some case the correct result.

There is already this JIRA issue open: https://sft.its.cern.ch/jira/browse/ROOT-8085

and not it is reported that gives wrong values for 0 and 100 percentiles, see:

https://root-forum.cern.ch/t/th1-getquantiles-gives-wrong-value-for-100-percentile/53284/3

See this example from JIRA:

#include "TCanvas.h"
#include <iostream>
#include "TH1D.h"
using std::cout, std::endl;

void quantiles() {
   TCanvas *c1 = new TCanvas();
   c1->Divide(1,2);
   TH1D *h = new TH1D("h","h",10,0,10);
   h->Fill(5);
   c1->cd(1);
   h->Draw();
   Double_t *p = new Double_t[1];
   p[0] = 0.5;
   Double_t *q = new Double_t[1];
   q[0] = 0;
   h->GetQuantiles(1,q,p);

   cout << "Median of h:" << q[0] << std::endl;
   cout << "This is ok!" << endl;

   TH1D *h2 = new TH1D("h2","h2",10,0,10);
   //for (int i = 0; i < 10; ++i) h2->Fill(i); 
   h2->Fill(2);
   h2->Fill(8);
   c1->cd(2);
   h2->Draw();
   Double_t *p2 = new Double_t[1];
   p2[0] = 0.5;
   Double_t *q2 = new Double_t[1];
   q2[0] = 0;
   int nq = h2->GetQuantiles(1,q2,p2);

   cout << "Median of h2 (and is never the same result when you execute the macro several times).: " << q2[0] << "  nq = " << nq << std::endl;
   cout << "NO!!! should be 5.5 as well" << endl;
}

and the one from the forum:

  new TCanvas();
  TH1F *h = new TH1F("hb", "hb", 30, -100., 200.); h->Draw();
  for(int j = 11; j <= 20; j++) h->SetBinContent(j, 1.);
  const int nq = 100;
  double xq[(nq + 1)]; // positions where to compute the quantiles in [0., 1.]
  for(int j = 0; j <= nq; j++) xq[j] = double(j) / double(nq);
  double yq[(nq + 1)]; // array to contain the quantiles
  h->GetQuantiles(nq + 1, yq, xq);
  // note: "0 %" should return 0 and "100 %" should return 100
  for(int j = 0; j <= nq; j++) std::cout << j << " % : " << yq[j] << "\n";

or all-together:

#include "TCanvas.h"
#include <iostream>
#include "TH1D.h"
#include "TH1F.h"
using std::cout, std::endl;

void quantiles() {
   TCanvas *c1 = new TCanvas();
   c1->Divide(1,2);
   TH1D *h = new TH1D("h","h",10,0,10);
   h->Fill(5);
   c1->cd(1);
   h->Draw();
   Double_t *p = new Double_t[3];
   p[0] = 0.;
   p[1] = 0.5;
   p[2] = 1;
   Double_t *q = new Double_t[3];
   h->GetQuantiles(3,q,p);

   cout << "Median of h:" << q[0] << " " << q[1] << " " << q[2] << std::endl;
   cout << "This is ok!" << endl;

   TH1D *h2 = new TH1D("h2","h2",10,0,10);
   //for (int i = 0; i < 10; ++i) h2->Fill(i);
   h2->Fill(2);
   h2->Fill(8);
   c1->cd(2);
   h2->Draw();
   Double_t *q2 = new Double_t[3];
   int nq = h2->GetQuantiles(3,q2,p);

   cout << "Median of h2:" << q2[0] << " " << q2[1] << " " << q2[2] << std::endl;
   cout << "NO!!! should be 5.5 as well" << endl;

  new TCanvas();
  h = new TH1D("hb", "hb", 30, -100., 200.); h->Draw();
  for(int j = 11; j <= 20; j++) h->SetBinContent(j, 1.);
  nq = 100;
  double xq[(nq + 1)]; // positions where to compute the quantiles in [0., 1.]
  for(int j = 0; j <= nq; j++) xq[j] = double(j) / double(nq);
  double yq[(nq + 1)]; // array to contain the quantiles
  h->GetQuantiles(nq + 1, yq, xq);
  // note: "0 %" should return 0 and "100 %" should return 100
  for(int j = 0; j <= nq; j++) std::cout << j << " % : " << yq[j] << "\n";
}
pitkajuh commented 4 months ago

Dear All. I tried to recreate the issue with Python

import numpy as np
import matplotlib.pyplot as plt

bins=np.linspace(0, 10, num=11)
a=np.array([2, 8])
a1=np.percentile(a, 50) #Median
plt.xticks([*range(0, 11, 1)], [*range(0, 11, 1)])
plt.axvline(a1)
plt.tight_layout()
plt.xlim([0, 10])
print(a1)
plt.hist(a, bins=bins)
plt.show()

According to the code above, the median is 5, not 5.5.

Calculating median with Octave/Matlab also gives the same result

median([2 8])

Which one is correct (the example from JIRA or my examples)?