openmc-dev / openmc

OpenMC Monte Carlo Code
https://docs.openmc.org
Other
729 stars 466 forks source link

Incorrect Simulation Results with Specific ICPC Compiler Versions #2683

Open cfichtlscherer opened 10 months ago

cfichtlscherer commented 10 months ago

Bug Description

This is a bug Sophie, Lukas and I discovered on our university cluster.

Compiling OpenMC with certain ICPC compilers will let you install and run the code without any warnings or error messages, but will return completly wrong results.

For example, the pincell example should return a criticality of around 1.15, compiled with certain ICPC compilers the criticality during the batch will be around 3.5 and the simulation results will be

 ============================>  RESULTS     <============================

 k-effective (Collision)    = 0.00000 +/- 0.00000
 k-effective (Track-length)  = 0.00000 +/- 0.00000
 k-effective (Absorption)   = 0.00000 +/- 0.00000
 Combined k-effective       = 0.00000 +/- 0.02774
 Leakage Fraction           = 0.00000 +/- 0.00000

The entire output can be found in a google docs.

We checked all ICPC compilers on our cluster with the following result:

intel-compilers/2021.2.0 (C) :white_check_mark: intel-compilers/2021.4.0 (C) :no_entry_sign: intel-compilers/Intel 2021.6.0 :no_entry_sign:
intel-compilers/2022.1.0 (C,D) :no_entry_sign: intel-compilers/2022.2.1 (C) :no_entry_sign: intel-compilers/2023.0.0 (L,C) :no_entry_sign: intel-compilers/2023.1.0 (C) :white_check_mark:

(:white_check_mark: = OpenMC returns correct results / :no_entry_sign: = OpenMC returns wrong results)

We were able to produce this bug for OpenMC 0.13.3 and the dev branch. We have not checked further OpenMC versions.

We were not able to reproduce this bug with other compilers.

Slack discussion

Steps to Reproduce

Install OpenMC from source via

git clone https://github.com/openmc-dev/openmc.git
cd openmc
mkdir build
cd build
module load intel-compilers/2023.0.0
cmake ..
make -j8

or set the compiler with

CXX=$path_to_ICPC_compiler cmake ..

Run the pincell example.

Environment

So far we only produced this bug on our cluster and would be happy if someone could try to reproduce.

$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              48
On-line CPU(s) list: 0-47
Thread(s) per core:  1
Core(s) per socket:  24
Socket(s):           2
NUMA node(s):        4
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
Stepping:            4
CPU MHz:             2100.000
CPU max MHz:         3700,0000
CPU min MHz:         1000,0000
BogoMIPS:            4200.00
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            33792K
NUMA node0 CPU(s):   0-2,6-8,12-14,18-20
NUMA node1 CPU(s):   3-5,9-11,15-17,21-23
NUMA node2 CPU(s):   24-26,30-32,36-38,42-44
NUMA node3 CPU(s):   27-29,33-35,39-41,45-47
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke md_clear flush_l1d arch_capabilities
cfichtlscherer commented 10 months ago

Update:

Actually there were some warnings during the make process:


In file included from /home/gf737457/openmc-13.3/openmc/src/finalize.cpp(18):
 /home/gf737457/openmc-13.3/openmc/include/openmc/plot.h(102): warning #858: type qualifier on return type is meaningless
 const int id() const { return id_; }
 ^
In file included from /home/gf737457/openmc-13.3/openmc/src/finalize.cpp(18):
 /home/gf737457/openmc-13.3/openmc/include/openmc/plot.h(103): warning #858: type qualifier on return type is meaningless
 const int level() const { return level_; }
 ^
In file included from /home/gf737457/openmc-13.3/openmc/src/initialize.cpp(26):
 /home/gf737457/openmc-13.3/openmc/include/openmc/plot.h(102): warning #858: type qualifier on return type is meaningless
 const int id() const { return id_; }
 ^
In file included from /home/gf737457/openmc-13.3/openmc/src/initialize.cpp(26):
 /home/gf737457/openmc-13.3/openmc/include/openmc/plot.h(103): warning #858: type qualifier on return type is meaningless
 const int level() const { return level_; }
 ^
In file included from /home/gf737457/openmc-13.3/openmc/src/output.cpp(33):
 /home/gf737457/openmc-13.3/openmc/include/openmc/plot.h(102): warning #858: type qualifier on return type is meaningless
 const int id() const { return id_; }
 ^
In file included from /home/gf737457/openmc-13.3/openmc/src/output.cpp(33):
 /home/gf737457/openmc-13.3/openmc/include/openmc/plot.h(103): warning #858: type qualifier on return type is meaningless
 const int level() const { return level_; }
 ^
In file included from /home/gf737457/openmc-13.3/openmc/src/plot.cpp(1):
 /home/gf737457/openmc-13.3/openmc/include/openmc/plot.h(102): warning #858: type qualifier on return type is meaningless
 const int id() const { return id_; }
 ^
In file included from /home/gf737457/openmc-13.3/openmc/src/plot.cpp(1):
 /home/gf737457/openmc-13.3/openmc/include/openmc/plot.h(103): warning #858: type qualifier on return type is meaningless
 const int level() const { return level_; }
 ^
In file included from /home/gf737457/openmc-13.3/openmc/src/settings.cpp(25):
 /home/gf737457/openmc-13.3/openmc/include/openmc/plot.h(102): warning #858: type qualifier on return type is meaningless
 const int id() const { return id_; }
 ^
In file included from /home/gf737457/openmc-13.3/openmc/src/settings.cpp(25):
 /home/gf737457/openmc-13.3/openmc/include/openmc/plot.h(103): warning #858: type qualifier on return type is meaningless
 const int level() const { return level_; }
 ^
cfichtlscherer commented 10 months ago

We have tried with

CMAKE_BUILD_TYPE=Debug

which compiles the code without optimizations and the error persists.

We conclude that the problem is not due to the optimization of the compiler.

cfichtlscherer commented 10 months ago

Seems like the criticality is about three times as high. And checking the statepoint we see that every source point exists three times: Pasted Image Sep 12, 2023 - 2 38 08pm

makeclean commented 10 months ago

It would be interesting to know if this works when you turn off threading

cfichtlscherer commented 10 months ago

The criticality values do not change depending on the number of used threads. So OpenMP does not seem to be the problem.

cfichtlscherer commented 10 months ago

Also compiling without OpenMP does not fix the problem.

paulromano commented 10 months ago

That is very strange. Some issue with threading would have been my guess too but it sounds like that's ruled out. Given that the most recent compiler version works, it may have been some weird compiler bug that has since been fixed :man_shrugging:

gridley commented 10 months ago

Seems like the criticality is about three times as high. And checking the statepoint we see that every source point exists three times:

It would be nice to track down what's wrong here, or at least the piece of code that leads to erroneous compiler behavior. I fear this is due to undefined behavior in OpenMC.

How do fixed source calcs look? You could run a fixed source calc that runs both with create_fission_neutrons true and false. Seems like the problem might lie there.

cfichtlscherer commented 10 months ago

Sorry for the late reply, and thanks for the good idea.

Since create_fission_neutrons only affects the fixed_source simulations, I tried running the pincell example as a fixed source simulation and used diff to compare the created tallies.out files. With create_fission_neutrons=false there is no difference.

It seems like the bug is somewhere where the created particles are written to the secondary / fission bank.