icecube / pisa

Monte Carlo-based data analysis
http://icecube.github.io/pisa/
Apache License 2.0
19 stars 49 forks source link

CPU vs GPU code disagrees for non-zero delta-CP #271

Closed jllanfranchi closed 6 years ago

jllanfranchi commented 7 years ago

First, i made example_gpu.cfg identical to example.cfg in all ways except order = ... osc:prob3gpu. Then I run the following command:

OUTDIR=/tmp/compare_cpu_gpu
PISA_FTYPE=float64 $PISA/pisa/scripts/compare.py \
    --ref settings/pipeline/example.cfg \
    --ref-label 'cpu' \
    --test settings/pipeline/example_gpu.cfg \
    --test-label 'gpu' \
    --outdir $OUTDIR \
    --pdf --png -v

Details change a little depending on how you pick binning, but we should be seeing fractional discrepancies near machine precision and we're nowhere close.

This is also surprising considering that test_consistency_with_pisa2.py shows worst-case around 1e-7 discrepancies.

philippeller commented 7 years ago

do both CPU and GPU show good consistency with pisa2?

steven-j-wren commented 7 years ago

I just checked out the latest cake master on our gpu machine and I get the same agreement with PISA 2 as we've always had. Since Maicon added that bit to the test script I double checked and it is testing against the same PISA 2 data in both the cpu and gpu cases. Will look in to the example configs for why Justin is seeing what he is seeing.

steven-j-wren commented 7 years ago

I thought the discrepancy was appearing after the reco stage. I print the first bin of the nue map after each stage and get this:

cpu
flux
[  2.09882704e+00   1.71987388e+00   1.40771449e+00   1.14006532e+00
   9.17159865e-01   7.35755465e-01   5.88205780e-01   4.70807694e-01
   3.72595450e-01   2.92482612e-01   2.30761958e-01   1.79843885e-01
   1.39303989e-01   1.08577465e-01   8.40708444e-02   6.45136609e-02
   4.94369368e-02   3.77739344e-02   2.90064649e-02   2.21941471e-02
   1.69143713e-02   1.28807269e-02   9.73038528e-03   7.38733665e-03
   5.62081937e-03   4.29007762e-03   3.23257646e-03   2.44792846e-03
   1.87001326e-03   1.39838914e-03   1.08051747e-03   8.05709618e-04
   6.17246172e-04   4.68337136e-04   3.55241811e-04   2.68200330e-04
   2.05085665e-04   1.58703815e-04   1.22233781e-04   9.29424196e-05]
osc
[  2.02113078e+00   1.56705903e+00   1.30031219e+00   1.08936699e+00
   8.93131838e-01   7.39454509e-01   5.76095836e-01   4.62123955e-01
   4.08689100e-01   3.87365486e-01   3.36494310e-01   2.45140215e-01
   1.48773685e-01   1.14497916e-01   1.19197842e-01   1.11508916e-01
   8.34928797e-02   5.31478251e-02   3.30334483e-02   2.24866282e-02
   1.70785533e-02   1.36944005e-02   1.08984500e-02   8.51798061e-03
   6.51383697e-03   4.91290075e-03   3.63612175e-03   2.69282871e-03
   2.01168264e-03   1.47762004e-03   1.12322621e-03   8.28817490e-04
   6.29325601e-04   4.74601672e-04   3.58447406e-04   2.69816405e-04
   2.05888872e-04   1.59100227e-04   1.22427774e-04   9.30362713e-05]
aeff
[ 295.07132235  267.54363573  340.96267891  293.25281537  301.02906018
  278.70537718  310.63202669  216.55391402  211.46466437  282.80218456
  270.23184262  196.72697752  170.16464091  128.55953272  244.62049204
  194.40293019  127.46486456  115.52709978   54.94431156   69.9619923
   42.23252043   45.53984994   31.10485101   28.1612233    33.78131601
   24.13755919   26.22744048   16.21988965   12.36853229    8.22660372
    7.70281913   10.7627264     7.05922807    5.08594937    4.52060738
    3.92549701    2.71246275    2.97949198    2.33082386    2.28410188]
reco
[ 236.38286786    0.        ]
gpu
flux
[  2.09882704e+00   1.71987388e+00   1.40771449e+00   1.14006532e+00
   9.17159865e-01   7.35755465e-01   5.88205780e-01   4.70807694e-01
   3.72595450e-01   2.92482612e-01   2.30761958e-01   1.79843885e-01
   1.39303989e-01   1.08577465e-01   8.40708444e-02   6.45136609e-02
   4.94369368e-02   3.77739344e-02   2.90064649e-02   2.21941471e-02
   1.69143713e-02   1.28807269e-02   9.73038528e-03   7.38733665e-03
   5.62081937e-03   4.29007762e-03   3.23257646e-03   2.44792846e-03
   1.87001326e-03   1.39838914e-03   1.08051747e-03   8.05709618e-04
   6.17246172e-04   4.68337136e-04   3.55241811e-04   2.68200330e-04
   2.05085665e-04   1.58703815e-04   1.22233781e-04   9.29424196e-05]
osc
[  2.02113078e+00   1.56705903e+00   1.30031219e+00   1.08936699e+00
   8.93131838e-01   7.39454509e-01   5.76095836e-01   4.62123955e-01
   4.08689100e-01   3.87365486e-01   3.36494310e-01   2.45140215e-01
   1.48773685e-01   1.14497916e-01   1.19197842e-01   1.11508916e-01
   8.34928797e-02   5.31478251e-02   3.30334483e-02   2.24866282e-02
   1.70785533e-02   1.36944005e-02   1.08984500e-02   8.51798061e-03
   6.51383697e-03   4.91290075e-03   3.63612175e-03   2.69282871e-03
   2.01168264e-03   1.47762004e-03   1.12322621e-03   8.28817490e-04
   6.29325601e-04   4.74601672e-04   3.58447406e-04   2.69816405e-04
   2.05888872e-04   1.59100227e-04   1.22427774e-04   9.30362713e-05]
aeff
[ 295.07132235  267.54363573  340.96267891  293.25281537  301.02906018
  278.70537718  310.63202669  216.55391402  211.46466437  282.80218456
  270.23184262  196.72697752  170.16464091  128.55953272  244.62049204
  194.40293019  127.46486456  115.52709978   54.94431156   69.9619923
   42.23252043   45.53984994   31.10485101   28.1612233    33.78131601
   24.13755919   26.22744048   16.21988965   12.36853229    8.22660372
    7.70281913   10.7627264     7.05922807    5.08594937    4.52060738
    3.92549701    2.71246275    2.97949198    2.33082386    2.28410188]
reco
[ 238.076431    0.      ]

Seems to be the reco, right? No. When I ask it to print 'nuebar' I get this:

cpu
flux
[  1.63187628e+00   1.33876334e+00   1.08552133e+00   8.78399246e-01
   7.11354465e-01   5.68767468e-01   4.52383966e-01   3.58928358e-01
   2.83916331e-01   2.23802438e-01   1.75535305e-01   1.37421920e-01
   1.06740658e-01   8.19301735e-02   6.34862802e-02   4.94853248e-02
   3.76351588e-02   2.88228753e-02   2.24795502e-02   1.70782744e-02
   1.28191967e-02   9.80668989e-03   7.44786275e-03   5.64408012e-03
   4.32463764e-03   3.26617592e-03   2.47613048e-03   1.91533768e-03
   1.43819275e-03   1.10168205e-03   8.31388103e-04   6.39169256e-04
   4.84038674e-04   3.70418384e-04   2.85185356e-04   2.18248685e-04
   1.67473036e-04   1.29611047e-04   9.87600502e-05   7.52090558e-05]
osc
[  1.61135279e+00   1.35038135e+00   1.09371927e+00   8.87999631e-01
   7.17137928e-01   5.69325203e-01   4.51749536e-01   3.61930180e-01
   2.87788499e-01   2.26653165e-01   1.77082537e-01   1.41143281e-01
   1.06926587e-01   8.27593712e-02   6.46383271e-02   4.96721325e-02
   3.77164589e-02   2.93055787e-02   2.30225143e-02   1.73934981e-02
   1.29113379e-02   9.81034646e-03   7.46215141e-03   5.69602292e-03
   4.40110265e-03   3.34660892e-03   2.54626231e-03   1.96887638e-03
   1.47648101e-03   1.12733296e-03   8.47750840e-04   6.49273027e-04
   4.90110978e-04   3.73997388e-04   2.87254073e-04   2.19431643e-04
   1.68140707e-04   1.29982948e-04   9.89676238e-05   7.53244496e-05]
aeff
[  82.8632171   102.7325039    53.1192151   110.53003361  157.29102795
   62.88726878   89.71584261   80.48023526   75.06941453  102.81906937
   46.0823253    77.18332159   56.33340666   28.95268751   57.67290429
   37.36741112   24.8847194    34.25608112   19.7640096    18.0982772
   26.93765964   11.30390974   11.45024928   10.38995194    8.20098482
    9.07441403    5.25158276    5.10810177    6.05266168    3.12060271
    3.52282707    1.80463971    2.65494654    2.10460551    2.13296058
    1.32428764    1.31970631    1.72253545    0.80029388    0.71434587]
reco
[ 236.38286786    0.        ]
gpu
flux
[  1.63187628e+00   1.33876334e+00   1.08552133e+00   8.78399246e-01
   7.11354465e-01   5.68767468e-01   4.52383966e-01   3.58928358e-01
   2.83916331e-01   2.23802438e-01   1.75535305e-01   1.37421920e-01
   1.06740658e-01   8.19301735e-02   6.34862802e-02   4.94853248e-02
   3.76351588e-02   2.88228753e-02   2.24795502e-02   1.70782744e-02
   1.28191967e-02   9.80668989e-03   7.44786275e-03   5.64408012e-03
   4.32463764e-03   3.26617592e-03   2.47613048e-03   1.91533768e-03
   1.43819275e-03   1.10168205e-03   8.31388103e-04   6.39169256e-04
   4.84038674e-04   3.70418384e-04   2.85185356e-04   2.18248685e-04
   1.67473036e-04   1.29611047e-04   9.87600502e-05   7.52090558e-05]
osc
[  1.76337891e+00   1.38033016e+00   1.10280508e+00   9.03736097e-01
   7.39865879e-01   5.86366492e-01   4.54028948e-01   3.65650610e-01
   2.91794257e-01   2.28163839e-01   1.78094334e-01   1.42538191e-01
   1.06950029e-01   8.30789318e-02   6.49605520e-02   4.97210654e-02
   3.77256400e-02   2.93741524e-02   2.30980603e-02   1.74361488e-02
   1.29245828e-02   9.81131294e-03   7.46204712e-03   5.69817125e-03
   4.40494662e-03   3.35079939e-03   2.54990146e-03   1.97162520e-03
   1.47838789e-03   1.12857624e-03   8.48519364e-04   6.49734210e-04
   4.90379313e-04   3.74150730e-04   2.87340116e-04   2.19479302e-04
   1.68166770e-04   1.29997035e-04   9.89752139e-05   7.53285176e-05]
aeff
[  90.68110418  105.01090931   53.56048997  112.48876432  162.27598645
   64.7696378    90.16852576   81.30752489   76.1143136   103.50437214
   46.34562595   77.94611943   56.34575673   29.06448316   57.96040628
   37.40422242   24.89077695   34.33623879   19.82886312   18.14265611
   26.96529315   11.30502336   11.45008927   10.39387065    8.20814765
    9.08577661    5.25908839    5.11523337    6.06047872    3.12404428
    3.52602067    1.80592155    2.65640012    2.10546842    2.13359948
    1.32457527    1.31991088    1.72272212    0.80035526    0.71438445]
reco
[ 238.076431    0.      ]

So it's computing the outputs for antineutrinos incorrectly. How it's passing the consistency script I have no idea...

jllanfranchi commented 7 years ago

Philipp and I narrowed this down to the oscillation stage, actually. One thing that masks the problem is that if you put all 1's as inputs for each input map to that stage (as the single-stage test does), the problem is completely masked. We manually set 1s to only e.g. nuebar and the problem shows up; same for 1s only for numubar inputs. For either nue or numu inputs, we see no issues.

philippeller commented 7 years ago

diff__test__ref this is what we get by setting the input numu-bar to ones and all others to zeros and then compare the outputs from prob3cpu (ref) and prob3gpu (test)

philippeller commented 7 years ago

I wonder if that is something that was recently introduced (for example by me when i changed the oscillation stuff) or always there and never noticed....

philippeller commented 7 years ago

AHA! I checked out an older version before i started touching the oscillation stuff (0a1046129d5eb97aa4c3b9f7aec151db0e96d707) and it's the same.

diff__girdprobfp64__prob3cpu

(Note: this was run with both codes in fp64, the picture from the previous comment was fp64 vs fp32 on GPU, so it's nicely showing up to floating point error agreement in the numu-bar map in both cases, and the same significant disagreement in the nue-bar and nutau-bar maps)

So that behavior is not caused by my changes and must be somewhere deeper, probably in the oscillation calculations themselves...

philippeller commented 7 years ago

so the question of the day is: which one is wrong, cpu or gpu, or both? We are taking bets now....

steven-j-wren commented 7 years ago

I went back and found what Tim did on this:

https://www.dropbox.com/s/qajfqbh6u4w83n3/Oscillations_Prob3_EarthModel_TCA.pdf?dl=0

Seems he only presented neutrinos and not antineutrinos...

philippeller commented 7 years ago

yes, saw that....

I thought it's also peculiar that we see only a difference in the anti neutrino appearance mode. So it now tested with a non-zero CP phase (here 30 deg) and then anti-neutrinos start to disagree across the board.

diff__gpu__cpu

I think there is something wrong. I also saw that in prob3cpu the Mixing matrix gets reset before the anti-neutrino run with an inverted phase, but not for the GPU code.

philippeller commented 7 years ago

must be some sort of bug in the complex conjugate operations....but i haven't found anything yet...

philippeller commented 7 years ago

BargerPropagator.cc L.200 delta *= -1.0 ; is definitely a differece to prob3cuda, not sure though which is correct

thehrh commented 7 years ago

Not sure whether this helps in pinning down the origin of these discrepancies, but with deltacp set to zero, no downsampling applied, I don't observe significant differences between the raw cpu and gpu oscillation probabilities in PISA2 for any oscillation channel (absolute agreement better than about 10^(-14) across the board). I can also provide the difference plots if anyone would like to see them.

The plots above show the rate differences if I understand correctly, with uniform unit fluxes as input?

thehrh commented 7 years ago

But same problem in the case of a non-zero deltacp (30 deg), for antineutrino e<->mu, e->tau, mu->tau, mu->mu: https://seafile.rlp.net/d/98cf381948/

thehrh commented 7 years ago

Injecting -deltacp in cpu mode seems to yield antineutrino probabilities agreeing with deltacp antineutrino probabilities in gpu mode. At http://www.phy.duke.edu/~raw22/public/Prob3++/, the authors say "For v1.00 (ie any file before 2012.11.08) the sign of the CP-violating phase delta must also be changed for antineutrino propagation. This is not necessary in later versions of the software." (Because the code handles it automatically as in our case, I presume) ... Let's me think the problem is with the gpu implementation.

thehrh commented 7 years ago

@philippeller The "lines" of vanishing difference you and I see also look suspiciously similar to the "magic" lines where the dependence of the oscillation probabilities on deltacp disappears (see e.g. https://arxiv.org/abs/0804.1466).

thehrh commented 7 years ago

If I'm not mistaken, the imaginary part of each PMNS matrix element, which is either proportional to sin(deltacp) or zero, needs to switch sign (deltacp -> -deltacp) when we propagate antineutrinos (this is the scheme implemented by the original prob3++ code, first neutrinos, then antineutrinos). Not sure whether there'd be a simple fix; in principle, I think there's the kNuBar variable in https://github.com/jllanfranchi/pisa/blob/cake/pisa/stages/osc/prob3gpu.py#L110 to discriminate between neutrinos and antineutrinos, but I don't understand how that's being used at the moment.

thehrh commented 7 years ago

Hi, I've tried to come up with a simple fix here: https://github.com/thehrh/pisa/tree/prob3cuda_fix (diff: https://github.com/jllanfranchi/pisa/compare/cake...thehrh:prob3cuda_fix?expand=1). The same modification in pisa2 yields agreement at the 10^(-14) level or better (in absolute magnitude) for negative and positive values of deltacp, in all nu and antinu channels (for the set of oscillation parameters used here, i.e. nufit2.1 lid injected). Delta-prob plots: https://seafile.rlp.net/d/98cf381948/

thehrh commented 7 years ago

Running jllanfranchi's command at the top, (fractional) differences with FP64 and FP32 are at the 10^(-13) level or smaller. Deltacp differs from zero in these comparisons. So at least both codes seem to be consistently doing either the right or the wrong thing now...

jllanfranchi commented 7 years ago

I also just ran comparisons in the basic cake branch but setting deltacp = 0, and even without the fix you've submitted, the CPU and GPU versions come back the same to within 1e-13. So I think we're at 99.99% certainty that the issue is completely to do with the handling of deltacp (since your fix yielded agreement for non-zero deltacp, @thehrh). And this probably explains why we haven't noticed this before, since CPU-GPU consistency of oscillations code was probably only tested for deltacp=0. I'll merge a change once we determine which consistency is the correct consistency...

steven-j-wren commented 7 years ago

Tim used to check it against NuCraft in PISA 2 since that solves the Schrodinger equation directly and is therefore "correct". Any chance you could do that on your PISA 2 branch Thomas? I know NuCraft takes forever, so maybe just for a few important point with non-zero deltacp.

thehrh commented 7 years ago

Yes, definitely, and I was about to suggest that. Apart from checking the code against the Barger formulas, we'll want an independent cross-check, and NuCraft seems like a good candidate.

thehrh commented 7 years ago

After testing a couple of deltacp values, it looks like there is a sign discrepancy between the two codes (NuCraft vs. Prob3++), meaning that in each case we only reproduce the other code's results (to within the agreement the two yield at deltacp=0, which is better than 0.5%) when switching the sign of deltacp. I hope I can find out what's going on. Plots are again at the link above.

jllanfranchi commented 7 years ago

@thehrh is the disagreement even at deltacp=0 expected to be 0.5%? And if so, is this due to the different way nucraft calculates e.g. neutrino height at production (i.e. not a fixed height in the atmosphere as Prob3++ uses)?

thehrh commented 7 years ago

The disagreement is of that order even with the same fixed production height in both codes. There are minor differences between the default NuCraft and Prob3++ 59 layer PREM models though (+NuCraft interpolates between adjacent values, while Prob3++ does not), and of course the two approaches differ by design.

philippeller commented 7 years ago

Hi, Sorry was absent for some time due to holidays and flu....yay. Anywhow, just to sumarize: