Closed pearce8 closed 3 months ago
@pearce8 I hate to ask this, but does this change in FOM change the results here: https://lanl.github.io/benchmarks/06_umt/umt.html#example-fom-results
Thanks,
Galen
I didn't run this initially or have the output so I don't know how many iterations there were to normalize it.
@aaroncblack @pearce8 Can you please provide us with the UMT configs for Rocinante / Crossroads, I believe @aaroncblack or @richards12 ran this on Roci, @dmageeLANL did not run this.
For roci, I believe @richards12 used an intel compiler build with most likely "-O2" optimization and no other compiler tweaks. That is what I did on my local LLNL intel platform.
In the lanl repo under the umt docs area I see his graph used the data points at 1, 8, 32, 56, 88, and 112 cores for both benchmark runs ( SPP1 and SPP2 problems ).
You'll want to target half the node memory on these ( 128GB per node on roci? So target 64GB memory use). The problem size can be adjusted by changing the size of the mesh with the "-B global -d x, y, z" where x,y,z is the number of mesh tiles in each axis dimension.
I tested locally at LLNL and found these numbers to work the best to get at/around 64GB for the problem.
bash-4.4$ srun -n1 ./install/bin/test_driver -B global -d 14,14,14 -b 1 bash-4.4$ srun -n1 ./install/bin/test_driver -B global -d 31,31,31 -b 2
Change the '-n1' to 1, 8, 32, 56, 88, 112 for the runs.
Between each cycle umt will output a line like: Teton driver: CPU MEM USE (rank 0): 581.305MB
If you multiply that by the # ranks you should get a rough estimate on total memory usage.
@dmageeLANL Can you run as @aaroncblack describes above? Thx
Let me know if you need more information about my runs and I will try to dig out the information. Someone might have to remind me how to connect to roci.
Dave
David Richards Center for Applied Scientific Computing Lawrence Livermore National Laboratory
From: Galen Shipman @.> Date: Wednesday, July 3, 2024 at 1:17 PM To: lanl/benchmarks @.> Cc: Richards, David @.>, Mention @.> Subject: Re: [lanl/benchmarks] Normalize UMT FOM by the number of iterations (PR #103)
@dmageeLANLhttps://urldefense.us/v3/__https:/github.com/dmageeLANL__;!!G2kpM7uM-TzIFchu!y8iOpv_CRCZyG8KLQNPOVhW3jWt_DMLG3v1mFBfmaxDNGeDIiQrcYwRjc0o862Z9z-Fm9Cr_yJWIH_jTcglbAVS1Q5O4$ Can you run as @aaroncblackhttps://urldefense.us/v3/__https:/github.com/aaroncblack__;!!G2kpM7uM-TzIFchu!y8iOpv_CRCZyG8KLQNPOVhW3jWt_DMLG3v1mFBfmaxDNGeDIiQrcYwRjc0o862Z9z-Fm9Cr_yJWIH_jTcglbAc4qvxSK$ describes above? Thx
— Reply to this email directly, view it on GitHubhttps://urldefense.us/v3/__https:/github.com/lanl/benchmarks/pull/103*issuecomment-2207169218__;Iw!!G2kpM7uM-TzIFchu!y8iOpv_CRCZyG8KLQNPOVhW3jWt_DMLG3v1mFBfmaxDNGeDIiQrcYwRjc0o862Z9z-Fm9Cr_yJWIH_jTcglbAUvGwgc2$, or unsubscribehttps://urldefense.us/v3/__https:/github.com/notifications/unsubscribe-auth/AATGEVDRI6WB62RREHL2KZDZKRL6LAVCNFSM6AAAAABKGS26D6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBXGE3DSMRRHA__;!!G2kpM7uM-TzIFchu!y8iOpv_CRCZyG8KLQNPOVhW3jWt_DMLG3v1mFBfmaxDNGeDIiQrcYwRjc0o862Z9z-Fm9Cr_yJWIH_jTcglbAcrW4Thk$. You are receiving this because you were mentioned.Message ID: @.***>
@richards12 It would be helpful to have your scripts to run UMT again in the same way you ran it. Do you need help getting onto to Roci?
@aaroncblack Those instructions look reasonable, I'll give it a shot later today. I'll let you know if I run into any issues. @gshipman @pearce8 @richards12
I looked in my LLNL accounts and found what appears to be a tar file with all of the “stuff” from my roci runs. It’s too big to attach to an email so I will share a link to it with Daniel, Galen, and Aaron. Let me know if anyone else needs it. I’ll also be happy to schedule a time to look through the contents and try to figure out what I was doing if it isn’t clear.
Dave
David Richards Center for Applied Scientific Computing Lawrence Livermore National Laboratory
From: Daniel J Magee @.> Date: Monday, July 8, 2024 at 9:40 AM To: lanl/benchmarks @.> Cc: Richards, David @.>, Mention @.> Subject: Re: [lanl/benchmarks] Normalize UMT FOM by the number of iterations (PR #103)
@aaroncblackhttps://urldefense.us/v3/__https:/github.com/aaroncblack__;!!G2kpM7uM-TzIFchu!0eEX5jt6Mj-8v3Ck1xcGBqAb8l_NG1IGrbXg86iUsofrJg9eF2jQvyF9QPgDzf-kVK4VWUM_R1Fdg4Loonhv--4m8tPC$ Those instructions look reasonable, I'll give it a shot later today. I'll let you know if I run into any issues. @gshipmanhttps://urldefense.us/v3/__https:/github.com/gshipman__;!!G2kpM7uM-TzIFchu!0eEX5jt6Mj-8v3Ck1xcGBqAb8l_NG1IGrbXg86iUsofrJg9eF2jQvyF9QPgDzf-kVK4VWUM_R1Fdg4Loonhv-7bC9UGG$ @pearce8https://urldefense.us/v3/__https:/github.com/pearce8__;!!G2kpM7uM-TzIFchu!0eEX5jt6Mj-8v3Ck1xcGBqAb8l_NG1IGrbXg86iUsofrJg9eF2jQvyF9QPgDzf-kVK4VWUM_R1Fdg4Loonhv-4ODCtWA$ @richards12https://urldefense.us/v3/__https:/github.com/richards12__;!!G2kpM7uM-TzIFchu!0eEX5jt6Mj-8v3Ck1xcGBqAb8l_NG1IGrbXg86iUsofrJg9eF2jQvyF9QPgDzf-kVK4VWUM_R1Fdg4Loonhv-6AXfAy1$
— Reply to this email directly, view it on GitHubhttps://urldefense.us/v3/__https:/github.com/lanl/benchmarks/pull/103*issuecomment-2214661601__;Iw!!G2kpM7uM-TzIFchu!0eEX5jt6Mj-8v3Ck1xcGBqAb8l_NG1IGrbXg86iUsofrJg9eF2jQvyF9QPgDzf-kVK4VWUM_R1Fdg4Loonhv-yZjUDx0$, or unsubscribehttps://urldefense.us/v3/__https:/github.com/notifications/unsubscribe-auth/AATGEVAI23QGZ3CRJTCIYALZLK6ILAVCNFSM6AAAAABKGS26D6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJUGY3DCNRQGE__;!!G2kpM7uM-TzIFchu!0eEX5jt6Mj-8v3Ck1xcGBqAb8l_NG1IGrbXg86iUsofrJg9eF2jQvyF9QPgDzf-kVK4VWUM_R1Fdg4Loonhv--Hz6wQq$. You are receiving this because you were mentioned.Message ID: @.***>
I got your package Dave. But I don't really know what it means. I see there's a lot more packages in umt_workspace (metis, mfem, hypre). Does UMT require these? Also, I see that there are results there which means there's a number of iterations. Does this mean we don't need to re run it and the rest of this message is moot?
I've built UMT on roci with conduit with the default environment: PrgEnv intel. I'm using the UMT in the benchmarks repo and the head of the develop branch of conduit (0.9.2). The build went generally smoothly, I built both with cmake. But runtime:
~ srun -N 1 -n 1 ./installs/bin/test_driver -B global -d 14,14,14 -b 1
Teton driver: number of MPI ranks: 1
Teton driver: Running predefined benchmark problem UMT SP#1
Teton driver: Threading enabled, max number of threads is 2
Teton driver: Rebuild with Conduit 0.8.9 or later to use tiled meshes.
srun: error: nid001109: task 0: Exited with exit code 1
srun: Terminating StepId=1412488.11
Which is weird because it's conduit 0.9.2. I tried setting export MPICH_SMP_SINGLE_COPY_MODE=CMA, MPICH_MAX_THREAD_SAFETY=multiple but no dice. There's absolutely no information about the error.
An older version of UMT required MFEM, but now we only need conduit.
Get Outlook for iOShttps://aka.ms/o0ukef
From: Daniel J Magee @.> Sent: Monday, July 8, 2024 4:49:08 PM To: lanl/benchmarks @.> Cc: Black, Aaron C. @.>; Mention @.> Subject: Re: [lanl/benchmarks] Normalize UMT FOM by the number of iterations (PR #103)
I got your package Dave. But I don't really know what it means. I see there's a lot more packages in umt_workspace (metis, mfem, hypre). Does UMT require these? Also, I see that there are results there which means there's a number of iterations. Does this mean we don't need to re run it and the rest of this message is moot?
I've built UMT on roci with conduit with the default environment: PrgEnv intel. I'm using the UMT in the benchmarks repo and the head of the develop branch of conduit (0.9.2). The build went generally smoothly, I built both with cmake. But runtime:
~ srun -N 1 -n 1 ./installs/bin/test_driver -B global -d 14,14,14 -b 1 Teton driver: number of MPI ranks: 1 Teton driver: Running predefined benchmark problem UMT SP#1 Teton driver: Threading enabled, max number of threads is 2 Teton driver: Rebuild with Conduit 0.8.9 or later to use tiled meshes. srun: error: nid001109: task 0: Exited with exit code 1 srun: Terminating StepId=1412488.11
Which is weird because it's conduit 0.9.2. I tried setting export MPICH_SMP_SINGLE_COPY_MODE=CMA, MPICH_MAX_THREAD_SAFETY=multiple but no dice. There's absolutely no information about the error.
— Reply to this email directly, view it on GitHubhttps://urldefense.us/v3/__https://github.com/lanl/benchmarks/pull/103*issuecomment-2215552733__;Iw!!G2kpM7uM-TzIFchu!y0DBmvipjH7hyCC0dC_LZf90GFKeDR_iLEv7P7FR_8Qv7PCXNnh3I5IQHY_nfjpldot-b62kHDqqynno_hqM1U5B2Co$, or unsubscribehttps://urldefense.us/v3/__https://github.com/notifications/unsubscribe-auth/AELLW46C3BUREK36JGPWWA3ZLMQPJAVCNFSM6AAAAABKGS26D6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJVGU2TENZTGM__;!!G2kpM7uM-TzIFchu!y0DBmvipjH7hyCC0dC_LZf90GFKeDR_iLEv7P7FR_8Qv7PCXNnh3I5IQHY_nfjpldot-b62kHDqqynno_hqMhGa0TRE$. You are receiving this because you were mentioned.Message ID: @.***>
@dmageeLANL , you said you are using the develop branch, i know some release processes only embed a version number into the build in tagged releases. Maybe UMT is looking for a version number and cant find it cause you have develop.
The head of develop is tagged as 0.9.2.
Daniel,
As Aaron mentioned, back when I did these runs UMT had more dependencies. Now that I think about it, that version of UMT also had a different input format and problem description. So I’m not sure how relevant any of the trials I did will be to the current version which has a significantly different problem definition.
Probably the most insight that you can find in the files is the input scripts to give you a sense of how I did the testing for different problem sizes. For different problem sizes (each R is a different problem size) I was running scaling across different numbers of MPI ranks. It looks like I also did multiple trials to check reproducibility of results.
Dave
David Richards Center for Applied Scientific Computing Lawrence Livermore National Laboratory
From: Daniel J Magee @.> Date: Monday, July 8, 2024 at 4:49 PM To: lanl/benchmarks @.> Cc: Richards, David @.>, Mention @.> Subject: Re: [lanl/benchmarks] Normalize UMT FOM by the number of iterations (PR #103)
I got your package Dave. But I don't really know what it means. I see there's a lot more packages in umt_workspace (metis, mfem, hypre). Does UMT require these? Also, I see that there are results there which means there's a number of iterations. Does this mean we don't need to re run it and the rest of this message is moot?
I've built UMT on roci with conduit with the default environment: PrgEnv intel. I'm using the UMT in the benchmarks repo and the head of the develop branch of conduit (0.9.2). The build went generally smoothly, I built both with cmake. But runtime:
~ srun -N 1 -n 1 ./installs/bin/test_driver -B global -d 14,14,14 -b 1
Teton driver: number of MPI ranks: 1
Teton driver: Running predefined benchmark problem UMT SP#1
Teton driver: Threading enabled, max number of threads is 2
Teton driver: Rebuild with Conduit 0.8.9 or later to use tiled meshes.
srun: error: nid001109: task 0: Exited with exit code 1
srun: Terminating StepId=1412488.11
Which is weird because it's conduit 0.9.2. I tried setting export MPICH_SMP_SINGLE_COPY_MODE=CMA, MPICH_MAX_THREAD_SAFETY=multiple but no dice. There's absolutely no information about the error.
— Reply to this email directly, view it on GitHubhttps://urldefense.us/v3/__https:/github.com/lanl/benchmarks/pull/103*issuecomment-2215552733__;Iw!!G2kpM7uM-TzIFchu!3qAgTW8heb1cocEc_iF16VMORsS_2jIFhEDd06FKHdEo0r52tR27o5H8xK55RZzeMvLIYdYncnE-WLoak_3NMHuxzNRN$, or unsubscribehttps://urldefense.us/v3/__https:/github.com/notifications/unsubscribe-auth/AATGEVHYTS7CE6MFYV7ZWRDZLMQPJAVCNFSM6AAAAABKGS26D6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJVGU2TENZTGM__;!!G2kpM7uM-TzIFchu!3qAgTW8heb1cocEc_iF16VMORsS_2jIFhEDd06FKHdEo0r52tR27o5H8xK55RZzeMvLIYdYncnE-WLoak_3NMN79jzto$. You are receiving this because you were mentioned.Message ID: @.***>
@dmageeLANL you mentioned you are using the version of UMT in the GitHub.com/lanl/benchmarks repo? This is 6 months old I think: https://github.com/LLNL/UMT/tree/ed70b58e77b6dfb29b6b7f01d53bde2a02b7f218 You need a relatively new co of UMT to get the changes in FOM I believe. Here is where it the message is coming from in that version, it isn't in newer versions of UMT. https://github.com/LLNL/UMT/blob/ed70b58e77b6dfb29b6b7f01d53bde2a02b7f218/src/teton/driver/test_driver.cc#L1844
@dmageeLANL
I verified that I can build and run on Roci using latest UMT and Conduit.
(base) gshipman@nid001234:/usr/projects/eap/users/gshipman/benchmarks/UMT/install-ro/bin> srun -n1 ./test_driver -B global -d 31,31,31 -b 2
Teton driver: number of MPI ranks: 1
Teton driver: Running predefined benchmark problem UMT SP#2
Detected UMT run, fixing temperature iterations to one and increasing max flux iterations to enable convergence.
Teton driver: Using older GTA kernel, version 1.
Teton: setting verbosity to 1
=================================================================
=================================================================
Test driver starting time steps
=================================================================
Solving for 2928574464 global unknowns.
(5719872 spatial elements * 32 directions (angles) * 16 energy groups)
CPU memory needed per rank (average) for radiation intensity (PSI): 22343.2MB
Current CPU memory use (rank 0): 43555.1MB
Iteration control: relative tolerance set to 1e-07.
=================================================================
>>>>>>>>>>>>>>> End of Radiation Step Report <<<<<<<<<<<<<<<
TIME STEP 1 timerad = 0.0010000000 dtrad = 1.0000000000E-03
FluxIters = 3
TrMax = 0.0479810101 in Zone 238624 on Process 0
TeMax = 0.5000000000 in Zone 686 on Process 0
Energy deposited in material = 0.0000000000E+00 ERad total = 5.5683591379E-08 Energy check = -4.1994305338E-20
Recommended time step for next rad cycle = 5.0000000000E-04
***************** Run Time *****************
Cycle (min) Accumulated (min)
RADTR = 2.72014894 2.72014894
Sweep(CPU) = 2.42665883 2.42665883
Sweep(GPU) = 0.00000000 0.00000000
Initialization = 0.27952584 0.27952584
Finalization = 0.00678847 0.00678847
***************** Convergence *****************
Controlled by = Intensity
ProcessID = 0
Zone = 1
Rel Error = 0.00000000000E+00
Tr = 3.13271659561E-02
Te = 5.00000000000E-01
Rho = 1.31000000000E+00
Cv = 5.01000000000E-01
Source Rate = 0.00000000000E+00
Coordinates = 2.4194E-03 2.4194E-03 1.6129E-02
***************** Time Step Vote *****************
For Cycle = 2
Controlled by = Rad Energy Density
ProcessID = 0
Control Zone = 407680
Recommend Dt = 5.00000000000E-04
Max Change = 8.45899370175E-01
Tr = 3.13271659561E-02
Tr Old = 5.00000000000E-02
Te = 5.00000000000E-01
Te Old = 5.00000000000E-01
Rho = 1.31000000000E+00
Cv = 5.01000000000E-01
Source Rate = 0.00000000000E+00
Coordinates = 9.9758E-01 2.4194E-03 1.6129E-02
Teton driver: CPU MEM USE (rank 0): 44254.1MB
I got it running. Sorry for the confusion, I hadn't noticed that the version of UMT in this repository was older. I used the newest UMT and it worked!
Sweet! Once you have the performance numbers, please update the csv files for the plots and tables and such in the GitHub pages documentation as well.
Ok I have results, but I'm not sure which number is the operative one. Here's the full result csv (do the results look reasonable?):
Problem,nprocs,iterations,memory,wall_time,single_throughput,total_throughput
1,1,15,52276.3,581.864,1.25169e+08,4.17231e+07
2,1,15,48315.3,724.603,6.06244e+07,2.02081e+07
1,8,22,7473.68,100.527,1.06259e+09,2.41498e+08
2,8,24,7140.54,158.118,4.44515e+08,9.26073e+07
1,32,33,1937.47,49.6981,3.22405e+09,4.88492e+08
2,32,33,1625.43,57.3876,1.68404e+09,2.55157e+08
1,56,42,1020.18,38.5321,5.29242e+09,6.30051e+08
2,56,41,1045.1,48.617,2.46974e+09,3.01188e+08
1,88,49,760.32,42.1732,5.6414e+09,5.75653e+08
2,88,47,661.52,35.6454,3.86146e+09,4.10793e+08
1,112,46,530.523,28.5231,7.8305e+09,8.51141e+08
2,112,46,559.891,31.278,4.30701e+09,4.68153e+08
The numbers come from this part of the output, this is from procs=1 problem=1:
Teton driver: CPU MEM USE (rank 0): 52276.3MB
=================================================================
=================================================================
Test driver finished time steps
=================================================================
Average throughput of single iteration of iterative solver was 1.25169e+08 unknowns calculated per second.
Throughput of iterative solver was 4.17231e+07 unknowns calculated per second.
(average throughput of single iteration * # iterations for solver to produce answer
Total number of flux solver iterations for run: 15
Total wall time for run: 581.864 seconds.
=================================================================
I just want to make sure I'm looking at the right numbers and running this correctly before I make any changes.
Thanks!