Open valassi opened 3 months ago
Thanks to @choij1589 see these extra numbers https://github.com/madgraph5/madgraph4gpu/issues/943#issuecomment-2268366920
I am very suprised because Jin was quoting
But the new numbers are
My feeling here is that the errors are completely underestimated. I mean, for the various fortran, 2188, 2206, 2228, these should be consistent wothin errors, right, @oliviermattelaer ?
Otherwise, if the errors are underestimated, I guess that also the discrepancy for DY+4 jets is completely acceptable?....
My feeling here is that the errors are completely underestimated.
This is an estimator of the (one sigma) error assuming that all channel of integration are completely un-correlated. Due to the assumption of no correlation, those error are typically slightly under-estimated. So if you compare "2188 +- 4" and "2206 +- 4 " the difference is 18+-8 so ~2 sigma difference. I typically do not worry about less than a 3 sigma missmatch. So I would say that this sounds compatible. (but they are no guarantee that the two fortran will be compatible --and this is impossible to have a bit by bit comparison between two fortran version).
For the DY+4j, the difference will be 38+-0.6, so 63 sigma ... which is too large.
Cheers,
Olivier
PS: Reducing the "285" subprocesses is something that we have to put in our todo list, this is too large and is/will be a blocker for CMS (and should be easy to fix for SIMD, likely more problematic for GPU)
So if you compare "2188 +- 4" and "2206 +- 4 " the difference is 18+-8 so ~2 sigma difference.
Hi Olivier thanks.
I was more comparing "2188+-4" to "2236+-0.5" from an earlier slide by Jin (which I guess was produced in the same setup).
One idea I had was to try and use different random numbers and get a spread. On x10 runs with "new" fortran (ie split into very many processes) the DY+2jets gives me this
Cross-section : 2.26e+04 +- 25.7 pb
Cross-section : 2.274e+04 +- 26.02 pb
Cross-section : 2.261e+04 +- 25.98 pb
Cross-section : 2.268e+04 +- 30.3 pb
Cross-section : 2.26e+04 +- 29.1 pb
Cross-section : 2.266e+04 +- 28.5 pb
Cross-section : 2.259e+04 +- 25.3 pb
Cross-section : 2.256e+04 +- 27.53 pb
Cross-section : 2.278e+04 +- 24.88 pb
Cross-section : 2.27e+04 +- 24.88 pb
Or more precisely
more tlau/logs_ppdy012j.mad_fortran/*txt | egrep '(Current est)'
- Current estimate of cross-section: 22604.882597000003 +- 25.69693417269259
- Current estimate of cross-section: 22736.487131999995 +- 26.02223931415431
- Current estimate of cross-section: 22606.672284000004 +- 25.982101016390413
- Current estimate of cross-section: 22680.418818000002 +- 30.296789851771535
- Current estimate of cross-section: 22598.979159 +- 29.095684586947588
- Current estimate of cross-section: 22661.842675000004 +- 28.504426906822836
- Current estimate of cross-section: 22594.760607 +- 25.30150482309723
- Current estimate of cross-section: 22562.885393999994 +- 27.53350228395446
- Current estimate of cross-section: 22783.444705999995 +- 24.879796947884447
- Current estimate of cross-section: 22699.778944 +- 24.883887513199372
At first glance, the error is underestimated.
But I also do not understand why I get 23000 while Jin gets 2300 (a factor 10 lower) cross section?
This is the following process, I thought this was consistent?
[avalassi@itgold91 gcc11/usr] /data/avalassi/GPU2024/madgraph4gpuX/epochX/cudacpp> more pp_dy012j.mad/mg5.in
set stdout_level DEBUG
set zerowidth_tchannel F
import model sm-no_b_mass
define p = u d c s b u~ d~ c~ s~ b~ g
define j = p
define ell+ = e+ mu+ ta+
define ell- = e- mu- ta-
define nu = ve vm vt
define nubar = ve~ vm~ vt~
generate p p > ell+ ell- @0
add process p p > ell+ ell- j @1
add process p p > ell+ ell- j j @2
output madevent_simd pp_dy012j.mad --hel_recycling=False --vector_size=32
PS Just so that I do not forget where this was, it is here from WIP PR #946
[avalassi@itscrd90 bash] /data/avalassi/GPU2023/ghav-madgraph4gpu/epochX/cudacpp> git reset --hard f1a9800900c9b9d85f62d03196eed15863d7891d
HEAD is now at f1a980090 [cmsdy] in tlau add the results of x10 ppttdy012j fortran tests (manually fix the directory name)
[avalassi@itscrd90 bash] /data/avalassi/GPU2023/ghav-madgraph4gpu/epochX/cudacpp> more tlau/logs_ppdy012j_fortran/*txt | egrep '(Current est)'
- Current estimate of cross-section: 22604.882597000003 +- 25.69693417269259
- Current estimate of cross-section: 22736.487131999995 +- 26.02223931415431
- Current estimate of cross-section: 22606.672284000004 +- 25.982101016390413
- Current estimate of cross-section: 22680.418818000002 +- 30.296789851771535
- Current estimate of cross-section: 22598.979159 +- 29.095684586947588
- Current estimate of cross-section: 22661.842675000004 +- 28.504426906822836
- Current estimate of cross-section: 22594.760607 +- 25.30150482309723
- Current estimate of cross-section: 22562.885393999994 +- 27.53350228395446
- Current estimate of cross-section: 22783.444705999995 +- 24.879796947884447
- Current estimate of cross-section: 22699.778944 +- 24.883887513199372
Hi @valassi , I think the 10 times xsec difference occurs from add process - For my production there is no DY+0j or DY+1j included, so it's like
generate p p > ell+ ell- j j @0
output madevent_gpu DY2j...
Hi @valassi , I think the 10 times xsec difference occurs from add process - For my production there is no DY+0j or DY+1j included, so it's like
generate p p > ell+ ell- j j @0 output madevent_gpu DY2j...
Thanks Jin! That explains, I will try 2j only.
PS Could it be that you also have cuts? If I just add 0j, 1j, 2j on your slide I get about 6000+4000+3000 ie 13000 while I see 23000
Hi, here are some fresh tests on DY+3j with the mg5amcnlo@3.5.5
, aka the current upstream Fortran.
NNPDF23_lo_as_0119_qed
PDF setI varied the parameters:
sde_strategy
= [1, 2]
;fixed_ren_scale
, fixed_fac_scale
= [False, True]
.The reason I also fixed the scales is because I had recorded that different values of sde_strategy
may result in different cross section values.
This was a tip from @oliviermattelaer.
fixed_ren_scale
, fixed_fac_scale
= False
sde_strategy = 1 | sde_strategy = 2 |
---|---|
1380 +- 3.2 | 1506 +- 4.1 |
1385 +- 3.3 | 1512 +- 4.5 |
1391 +- 3 | 1519 +- 3.8 |
1394 +- 3.4 | 1511 +- 3.7 |
1388 +- 3.1 | 1512 +- 3.8 |
Average | |
---|---|
1387.6 +- 1.4 | 1512.0 +- 1.8 |
fixed_ren_scale
, fixed_fac_scale
= True
sde_strategy = 1 | sde_strategy = 2 |
---|---|
1477 +- 3.5 | 1542 +- 4.1 |
1475 +- 3.5 | 1545 +- 4.5 |
1474 +- 3.5 | 1544 +- 4.2 |
1475 +- 3.6 | 1545 +- 3.7 |
1471 +- 3.4 | 1547 +- 4.3 |
Average | |
---|---|
1474.4 +- 1.6 | 1544 +- 1.9 |
The Jin's results (from slides of 30/07) are:
It seems to me the sde_strategy = 1
with not fixed scale reproduces well the CUDA results.
However, I'm a bit worried about the differences with the choice of sde_strategy
values.
@oliviermattelaer what do you think?
Thanks a lot Daniele, very clear report.
I agree with you that this indicates that DY+jets has some issue with the phase-space integration (likely for sde_strategy=2). Maybe one think that you can do here (if you have the time) is to force the phase-space point for all channel of integration to be (always) the same (you can do that when the smatrix function is called) and then print the multi-channel factor for each channel and check that the sum of those is indeed one for each strategy.
Cheers,
Olivier
This is another followup to the meeting with CMS last week and the meeting with CMS yesterday https://indico.cern.ch/event/1373473/
@choij1589 presented results where the cross section for Drell Yan plus 4 jets is different in fortran and in cuda/cpp
We should understand why CMS sees this cross section discrepancy for DY+4 jets
*NB: IIUC the fortran version here is the original fortran (no vector_size), not the cudacpp version (with vector_size)