Open stwunsch opened 3 years ago
According to https://en.cppreference.com/w/cpp/numeric/math/nextafter,
TH1F stops working well at 1e7 (with integer weights). Should we add this as the maximum value for TH1F, as is with TH1C for 128 e.g. With non-integer weights, this might become more of a problem to check as it is highly dependent on the chosen weight. But usually w = 1.
Precision loss demo for float:
nextafter(1e+01, INF) gives 10.000001; Δ = 0.000001
nextafter(1e+02, INF) gives 100.000008; Δ = 0.000008
nextafter(1e+03, INF) gives 1000.000061; Δ = 0.000061
nextafter(1e+04, INF) gives 10000.000977; Δ = 0.000977
nextafter(1e+05, INF) gives 100000.007812; Δ = 0.007812
nextafter(1e+06, INF) gives 1000000.062500; Δ = 0.062500
nextafter(1e+07, INF) gives 10000001.000000; Δ = 1.000000
nextafter(1e+08, INF) gives 100000008.000000; Δ = 8.000000
I proposed a pull request.
Yes, but why using TH1F ? Everybody should always use TH1D, unless there are some memory issues. I have seen problem like this already too many times
If we want to encourage that change, I think we should start by removing TH1F from all the doxygen examples in ROOT, which is I believe why many people still use TH1F.
If you run a grep, there are almost 2000 results. Most of them in the tutorials and test folders. Others in roofit and tmva.
why using TH1F ? Everybody should always use TH1D,
TTree::Draw
A summary of the discussion at the linked PR:
TTree::Draw
, rendering its deprecation very complicated
- We cannot implement a precision loss check in TH*F classes as they are implemented currently, as it would effectively be a no-op ...
Why not ?
You can always check that (value in the bin after fill) - (value in the bin before fill) is reasonably close to the value that was added, and print a warning message otherwise.
... and a waste of CPU cycles
Ah, yes, it would definitely be slower !
You can always check that (value in the bin after fill) - (value in the bin before fill) is reasonably close to the value that was added, and print a warning message otherwise.
Not really. Your suggestion would work well if you only had AddBinContentByOne
. But if you have AddBinContentByWeight
, then what's "close" becomes non-trivial. In other words, Closeness is a function of Weight. So your limit would depend on Weight. There is no way to ensure that that a user always calls AddBinContentByWeight with the same weight. There is no way to ensure that the user calls uses the same weight for each bin of the histogram.
This would result in different "overflow bin limits" for every bin in the histogram. So it's an ill-posed problem.
I attempted to do this with std::nextafter - current_value
comparing it vs weight
, but as said, this is completely problematic if you have changing weights.
To me, the only solution is using TH1L where the overflow limit is well defined, and forget about floating precision.
Sorry, but I strongly disagree.
TH1F
implements Fill(x, w)
via AddBinContent(bin, w)
:
void AddBinContent(Int_t bin, Double_t w) override
{
fArray[bin] += Float_t (w);
}
If one wants to be warned about overflows, it could be changed to
void AddBinContent(Int_t bin, Double_t w) override
{
float old = fArray[bin];
fArray[bin] += Float_t (w);
float inc = fArray[bin] - old;
if (inc != (float) w) { // could be done with a non-exact comparison with some tolerance
std::cerr << "Warning: TH1F::Fill(...) failed to increment the bin due to limited floating point precision\n";
}
}
// could be done with a non-exact comparison with some tolerance
Yeah, that's what I meant. Please define a tolerance that scales over order of magnitudes and weights, and that also takes into account clamping and overflows...
Sorry, I assumed that would be your job ?
Not my job, I am a volunteer.
Re-opening the issue following further discussion. The linked PR is still valid as it documents the current state of the implementation, so that doesn't need to be changed. An investigation into finding a tolerance that can account for different (orders of magnitude of) weights is the next step for this issue. Since it was not foreseen in the PoW for 2024, we cannot give an ETA at this moment.
As alternative ideas:
From my point of view, I will just go towards TH1D or TH1L and away from TTree::Draw
Here is a implementation that may be naive, but I would argue catches the vast majority of the use cases:
constexpr bool compare(float expected, float actual) {
// most simple and most common case
if (actual == expected)
return true;
// comparison with an arbitrary small tolerance
constexpr const float epsilon = std::numeric_limits<float>::epsilon();
const float delta = std::fabs(expected) * epsilon;
if ((actual > expected - delta) and (actual < expected + delta))
return true;
return false;
}
If any of the arguments (the weight or the actual increment) is NaN or infinite the function should return false
, which kind of makes sense in the above context.
With @lmoneta we were discussing in the PR this kind of case:
a histogram with an initial SetBinContent of 1e8, and you add an event with weight 8. This leads to an error
(1e8f+8.01f)- 1e8f - 8.01f = -0.01f
which compared to the bin content of 1e8 is a negligible difference.
But compare(8.01f,8.00f)
would return that the increment is not the same.
So we were thinking of defining somehow a relative tolerance. We used std::nextafterf and compared the relative distance wrt the original, and divided by w. But weird things may happen here, because you might call Fill with a negative weight, and the result might come close to zero for some bins, so a relative normalization is also ugly. We would need a compromise somehow between an absolute and a relative normalization for the tolerance, or adding a lot of CPU-wasting checks. Or just focus on the main cases with positive weights.
Relative with respect to the bin value (before the increment), or with respect to the increment, or with respect to the "correct" bin value after the increment ?
See for comparison the screenshot below.
The upper plot was done with
TTree.Draw
:The lower plot was done with
RDataFrame.Histo1D
:I've used ROOT 6.22/02 and you can download the file here:
http://opendata.web.cern.ch/record/12353