Closed jllanfranchi closed 6 years ago
do both CPU and GPU show good consistency with pisa2?
I just checked out the latest cake master on our gpu machine and I get the same agreement with PISA 2 as we've always had. Since Maicon added that bit to the test script I double checked and it is testing against the same PISA 2 data in both the cpu and gpu cases. Will look in to the example configs for why Justin is seeing what he is seeing.
I thought the discrepancy was appearing after the reco stage. I print the first bin of the nue
map after each stage and get this:
cpu
flux
[ 2.09882704e+00 1.71987388e+00 1.40771449e+00 1.14006532e+00
9.17159865e-01 7.35755465e-01 5.88205780e-01 4.70807694e-01
3.72595450e-01 2.92482612e-01 2.30761958e-01 1.79843885e-01
1.39303989e-01 1.08577465e-01 8.40708444e-02 6.45136609e-02
4.94369368e-02 3.77739344e-02 2.90064649e-02 2.21941471e-02
1.69143713e-02 1.28807269e-02 9.73038528e-03 7.38733665e-03
5.62081937e-03 4.29007762e-03 3.23257646e-03 2.44792846e-03
1.87001326e-03 1.39838914e-03 1.08051747e-03 8.05709618e-04
6.17246172e-04 4.68337136e-04 3.55241811e-04 2.68200330e-04
2.05085665e-04 1.58703815e-04 1.22233781e-04 9.29424196e-05]
osc
[ 2.02113078e+00 1.56705903e+00 1.30031219e+00 1.08936699e+00
8.93131838e-01 7.39454509e-01 5.76095836e-01 4.62123955e-01
4.08689100e-01 3.87365486e-01 3.36494310e-01 2.45140215e-01
1.48773685e-01 1.14497916e-01 1.19197842e-01 1.11508916e-01
8.34928797e-02 5.31478251e-02 3.30334483e-02 2.24866282e-02
1.70785533e-02 1.36944005e-02 1.08984500e-02 8.51798061e-03
6.51383697e-03 4.91290075e-03 3.63612175e-03 2.69282871e-03
2.01168264e-03 1.47762004e-03 1.12322621e-03 8.28817490e-04
6.29325601e-04 4.74601672e-04 3.58447406e-04 2.69816405e-04
2.05888872e-04 1.59100227e-04 1.22427774e-04 9.30362713e-05]
aeff
[ 295.07132235 267.54363573 340.96267891 293.25281537 301.02906018
278.70537718 310.63202669 216.55391402 211.46466437 282.80218456
270.23184262 196.72697752 170.16464091 128.55953272 244.62049204
194.40293019 127.46486456 115.52709978 54.94431156 69.9619923
42.23252043 45.53984994 31.10485101 28.1612233 33.78131601
24.13755919 26.22744048 16.21988965 12.36853229 8.22660372
7.70281913 10.7627264 7.05922807 5.08594937 4.52060738
3.92549701 2.71246275 2.97949198 2.33082386 2.28410188]
reco
[ 236.38286786 0. ]
gpu
flux
[ 2.09882704e+00 1.71987388e+00 1.40771449e+00 1.14006532e+00
9.17159865e-01 7.35755465e-01 5.88205780e-01 4.70807694e-01
3.72595450e-01 2.92482612e-01 2.30761958e-01 1.79843885e-01
1.39303989e-01 1.08577465e-01 8.40708444e-02 6.45136609e-02
4.94369368e-02 3.77739344e-02 2.90064649e-02 2.21941471e-02
1.69143713e-02 1.28807269e-02 9.73038528e-03 7.38733665e-03
5.62081937e-03 4.29007762e-03 3.23257646e-03 2.44792846e-03
1.87001326e-03 1.39838914e-03 1.08051747e-03 8.05709618e-04
6.17246172e-04 4.68337136e-04 3.55241811e-04 2.68200330e-04
2.05085665e-04 1.58703815e-04 1.22233781e-04 9.29424196e-05]
osc
[ 2.02113078e+00 1.56705903e+00 1.30031219e+00 1.08936699e+00
8.93131838e-01 7.39454509e-01 5.76095836e-01 4.62123955e-01
4.08689100e-01 3.87365486e-01 3.36494310e-01 2.45140215e-01
1.48773685e-01 1.14497916e-01 1.19197842e-01 1.11508916e-01
8.34928797e-02 5.31478251e-02 3.30334483e-02 2.24866282e-02
1.70785533e-02 1.36944005e-02 1.08984500e-02 8.51798061e-03
6.51383697e-03 4.91290075e-03 3.63612175e-03 2.69282871e-03
2.01168264e-03 1.47762004e-03 1.12322621e-03 8.28817490e-04
6.29325601e-04 4.74601672e-04 3.58447406e-04 2.69816405e-04
2.05888872e-04 1.59100227e-04 1.22427774e-04 9.30362713e-05]
aeff
[ 295.07132235 267.54363573 340.96267891 293.25281537 301.02906018
278.70537718 310.63202669 216.55391402 211.46466437 282.80218456
270.23184262 196.72697752 170.16464091 128.55953272 244.62049204
194.40293019 127.46486456 115.52709978 54.94431156 69.9619923
42.23252043 45.53984994 31.10485101 28.1612233 33.78131601
24.13755919 26.22744048 16.21988965 12.36853229 8.22660372
7.70281913 10.7627264 7.05922807 5.08594937 4.52060738
3.92549701 2.71246275 2.97949198 2.33082386 2.28410188]
reco
[ 238.076431 0. ]
Seems to be the reco, right? No. When I ask it to print 'nuebar' I get this:
cpu
flux
[ 1.63187628e+00 1.33876334e+00 1.08552133e+00 8.78399246e-01
7.11354465e-01 5.68767468e-01 4.52383966e-01 3.58928358e-01
2.83916331e-01 2.23802438e-01 1.75535305e-01 1.37421920e-01
1.06740658e-01 8.19301735e-02 6.34862802e-02 4.94853248e-02
3.76351588e-02 2.88228753e-02 2.24795502e-02 1.70782744e-02
1.28191967e-02 9.80668989e-03 7.44786275e-03 5.64408012e-03
4.32463764e-03 3.26617592e-03 2.47613048e-03 1.91533768e-03
1.43819275e-03 1.10168205e-03 8.31388103e-04 6.39169256e-04
4.84038674e-04 3.70418384e-04 2.85185356e-04 2.18248685e-04
1.67473036e-04 1.29611047e-04 9.87600502e-05 7.52090558e-05]
osc
[ 1.61135279e+00 1.35038135e+00 1.09371927e+00 8.87999631e-01
7.17137928e-01 5.69325203e-01 4.51749536e-01 3.61930180e-01
2.87788499e-01 2.26653165e-01 1.77082537e-01 1.41143281e-01
1.06926587e-01 8.27593712e-02 6.46383271e-02 4.96721325e-02
3.77164589e-02 2.93055787e-02 2.30225143e-02 1.73934981e-02
1.29113379e-02 9.81034646e-03 7.46215141e-03 5.69602292e-03
4.40110265e-03 3.34660892e-03 2.54626231e-03 1.96887638e-03
1.47648101e-03 1.12733296e-03 8.47750840e-04 6.49273027e-04
4.90110978e-04 3.73997388e-04 2.87254073e-04 2.19431643e-04
1.68140707e-04 1.29982948e-04 9.89676238e-05 7.53244496e-05]
aeff
[ 82.8632171 102.7325039 53.1192151 110.53003361 157.29102795
62.88726878 89.71584261 80.48023526 75.06941453 102.81906937
46.0823253 77.18332159 56.33340666 28.95268751 57.67290429
37.36741112 24.8847194 34.25608112 19.7640096 18.0982772
26.93765964 11.30390974 11.45024928 10.38995194 8.20098482
9.07441403 5.25158276 5.10810177 6.05266168 3.12060271
3.52282707 1.80463971 2.65494654 2.10460551 2.13296058
1.32428764 1.31970631 1.72253545 0.80029388 0.71434587]
reco
[ 236.38286786 0. ]
gpu
flux
[ 1.63187628e+00 1.33876334e+00 1.08552133e+00 8.78399246e-01
7.11354465e-01 5.68767468e-01 4.52383966e-01 3.58928358e-01
2.83916331e-01 2.23802438e-01 1.75535305e-01 1.37421920e-01
1.06740658e-01 8.19301735e-02 6.34862802e-02 4.94853248e-02
3.76351588e-02 2.88228753e-02 2.24795502e-02 1.70782744e-02
1.28191967e-02 9.80668989e-03 7.44786275e-03 5.64408012e-03
4.32463764e-03 3.26617592e-03 2.47613048e-03 1.91533768e-03
1.43819275e-03 1.10168205e-03 8.31388103e-04 6.39169256e-04
4.84038674e-04 3.70418384e-04 2.85185356e-04 2.18248685e-04
1.67473036e-04 1.29611047e-04 9.87600502e-05 7.52090558e-05]
osc
[ 1.76337891e+00 1.38033016e+00 1.10280508e+00 9.03736097e-01
7.39865879e-01 5.86366492e-01 4.54028948e-01 3.65650610e-01
2.91794257e-01 2.28163839e-01 1.78094334e-01 1.42538191e-01
1.06950029e-01 8.30789318e-02 6.49605520e-02 4.97210654e-02
3.77256400e-02 2.93741524e-02 2.30980603e-02 1.74361488e-02
1.29245828e-02 9.81131294e-03 7.46204712e-03 5.69817125e-03
4.40494662e-03 3.35079939e-03 2.54990146e-03 1.97162520e-03
1.47838789e-03 1.12857624e-03 8.48519364e-04 6.49734210e-04
4.90379313e-04 3.74150730e-04 2.87340116e-04 2.19479302e-04
1.68166770e-04 1.29997035e-04 9.89752139e-05 7.53285176e-05]
aeff
[ 90.68110418 105.01090931 53.56048997 112.48876432 162.27598645
64.7696378 90.16852576 81.30752489 76.1143136 103.50437214
46.34562595 77.94611943 56.34575673 29.06448316 57.96040628
37.40422242 24.89077695 34.33623879 19.82886312 18.14265611
26.96529315 11.30502336 11.45008927 10.39387065 8.20814765
9.08577661 5.25908839 5.11523337 6.06047872 3.12404428
3.52602067 1.80592155 2.65640012 2.10546842 2.13359948
1.32457527 1.31991088 1.72272212 0.80035526 0.71438445]
reco
[ 238.076431 0. ]
So it's computing the outputs for antineutrinos incorrectly. How it's passing the consistency script I have no idea...
Philipp and I narrowed this down to the oscillation stage, actually. One thing that masks the problem is that if you put all 1's as inputs for each input map to that stage (as the single-stage test does), the problem is completely masked. We manually set 1s to only e.g. nuebar and the problem shows up; same for 1s only for numubar inputs. For either nue or numu inputs, we see no issues.
this is what we get by setting the input numu-bar to ones and all others to zeros and then compare the outputs from prob3cpu (ref) and prob3gpu (test)
I wonder if that is something that was recently introduced (for example by me when i changed the oscillation stuff) or always there and never noticed....
AHA! I checked out an older version before i started touching the oscillation stuff (0a1046129d5eb97aa4c3b9f7aec151db0e96d707) and it's the same.
(Note: this was run with both codes in fp64, the picture from the previous comment was fp64 vs fp32 on GPU, so it's nicely showing up to floating point error agreement in the numu-bar map in both cases, and the same significant disagreement in the nue-bar and nutau-bar maps)
So that behavior is not caused by my changes and must be somewhere deeper, probably in the oscillation calculations themselves...
so the question of the day is: which one is wrong, cpu or gpu, or both? We are taking bets now....
I went back and found what Tim did on this:
https://www.dropbox.com/s/qajfqbh6u4w83n3/Oscillations_Prob3_EarthModel_TCA.pdf?dl=0
Seems he only presented neutrinos and not antineutrinos...
yes, saw that....
I thought it's also peculiar that we see only a difference in the anti neutrino appearance mode. So it now tested with a non-zero CP phase (here 30 deg) and then anti-neutrinos start to disagree across the board.
I think there is something wrong. I also saw that in prob3cpu the Mixing matrix gets reset before the anti-neutrino run with an inverted phase, but not for the GPU code.
must be some sort of bug in the complex conjugate operations....but i haven't found anything yet...
BargerPropagator.cc L.200 delta *= -1.0 ;
is definitely a differece to prob3cuda, not sure though which is correct
Not sure whether this helps in pinning down the origin of these discrepancies, but with deltacp set to zero, no downsampling applied, I don't observe significant differences between the raw cpu and gpu oscillation probabilities in PISA2 for any oscillation channel (absolute agreement better than about 10^(-14) across the board). I can also provide the difference plots if anyone would like to see them.
The plots above show the rate differences if I understand correctly, with uniform unit fluxes as input?
But same problem in the case of a non-zero deltacp (30 deg), for antineutrino e<->mu, e->tau, mu->tau, mu->mu: https://seafile.rlp.net/d/98cf381948/
Injecting -deltacp in cpu mode seems to yield antineutrino probabilities agreeing with deltacp antineutrino probabilities in gpu mode. At http://www.phy.duke.edu/~raw22/public/Prob3++/, the authors say "For v1.00 (ie any file before 2012.11.08) the sign of the CP-violating phase delta must also be changed for antineutrino propagation. This is not necessary in later versions of the software." (Because the code handles it automatically as in our case, I presume) ... Let's me think the problem is with the gpu implementation.
@philippeller The "lines" of vanishing difference you and I see also look suspiciously similar to the "magic" lines where the dependence of the oscillation probabilities on deltacp disappears (see e.g. https://arxiv.org/abs/0804.1466).
If I'm not mistaken, the imaginary part of each PMNS matrix element, which is either proportional to sin(deltacp) or zero, needs to switch sign (deltacp -> -deltacp) when we propagate antineutrinos (this is the scheme implemented by the original prob3++ code, first neutrinos, then antineutrinos). Not sure whether there'd be a simple fix; in principle, I think there's the kNuBar
variable in https://github.com/jllanfranchi/pisa/blob/cake/pisa/stages/osc/prob3gpu.py#L110 to discriminate between neutrinos and antineutrinos, but I don't understand how that's being used at the moment.
Hi, I've tried to come up with a simple fix here: https://github.com/thehrh/pisa/tree/prob3cuda_fix (diff: https://github.com/jllanfranchi/pisa/compare/cake...thehrh:prob3cuda_fix?expand=1). The same modification in pisa2 yields agreement at the 10^(-14) level or better (in absolute magnitude) for negative and positive values of deltacp, in all nu and antinu channels (for the set of oscillation parameters used here, i.e. nufit2.1 lid injected). Delta-prob plots: https://seafile.rlp.net/d/98cf381948/
Running jllanfranchi's command at the top, (fractional) differences with FP64 and FP32 are at the 10^(-13) level or smaller. Deltacp differs from zero in these comparisons. So at least both codes seem to be consistently doing either the right or the wrong thing now...
I also just ran comparisons in the basic cake
branch but setting deltacp = 0, and even without the fix you've submitted, the CPU and GPU versions come back the same to within 1e-13. So I think we're at 99.99% certainty that the issue is completely to do with the handling of deltacp (since your fix yielded agreement for non-zero deltacp, @thehrh). And this probably explains why we haven't noticed this before, since CPU-GPU consistency of oscillations code was probably only tested for deltacp=0. I'll merge a change once we determine which consistency is the correct consistency...
Tim used to check it against NuCraft in PISA 2 since that solves the Schrodinger equation directly and is therefore "correct". Any chance you could do that on your PISA 2 branch Thomas? I know NuCraft takes forever, so maybe just for a few important point with non-zero deltacp.
Yes, definitely, and I was about to suggest that. Apart from checking the code against the Barger formulas, we'll want an independent cross-check, and NuCraft seems like a good candidate.
After testing a couple of deltacp values, it looks like there is a sign discrepancy between the two codes (NuCraft vs. Prob3++), meaning that in each case we only reproduce the other code's results (to within the agreement the two yield at deltacp=0, which is better than 0.5%) when switching the sign of deltacp. I hope I can find out what's going on. Plots are again at the link above.
@thehrh is the disagreement even at deltacp=0 expected to be 0.5%? And if so, is this due to the different way nucraft calculates e.g. neutrino height at production (i.e. not a fixed height in the atmosphere as Prob3++ uses)?
The disagreement is of that order even with the same fixed production height in both codes. There are minor differences between the default NuCraft and Prob3++ 59 layer PREM models though (+NuCraft interpolates between adjacent values, while Prob3++ does not), and of course the two approaches differ by design.
Hi, Sorry was absent for some time due to holidays and flu....yay. Anywhow, just to sumarize:
First, i made
example_gpu.cfg
identical toexample.cfg
in all ways exceptorder = ... osc:prob3gpu
. Then I run the following command:Details change a little depending on how you pick binning, but we should be seeing fractional discrepancies near machine precision and we're nowhere close.
This is also surprising considering that
test_consistency_with_pisa2.py
shows worst-case around 1e-7 discrepancies.