sandialabs / Albany

Sandia National Laboratories' Albany multiphysics code
Other
282 stars 89 forks source link

Optimizing StokesFOResid for LandIce 3D #1048

Closed OscarAntepara closed 5 months ago

OscarAntepara commented 6 months ago

Optimizing StokesFOResid for LandIce 3D by cleaning the code, removing if statements inside the kernel, doing local accumulation and using compile time variables for the loops. For the ant-16km test, the original code timings are:

Phalanx: Evaluator 86: [] StokesFOResid: 0.769688 - 67.6716% [8] {min=0.758983, max=0.798954, std dev=0.0195347} Phalanx: Evaluator 15: [] StokesFOResid: 0.0437266 - 22.2496% [13] {min=0.0424196, max=0.0470457, std dev=0.00221728}

New code timings are:

Phalanx: Evaluator 86: [] StokesFOResid: 0.284896 - 43.2264% [8] {min=0.283264, max=0.287391, std dev=0.00182733} Phalanx: Evaluator 15: [] StokesFOResid: 0.024342 - 14.3191% [13] {min=0.0241703, max=0.0244444, std dev=0.000118884}

OscarAntepara commented 6 months ago

Yes, this is on pmgpu. I haven't tried frontier.

mperego commented 6 months ago

Thanks @OscarAntepara ! So the issue was that writing to MDFields (e.g., Residual(cell,node,1)) is expensive on GPUs? Is it possible to refactor the code to avoid the duplication of this code? https://github.com/sandialabs/Albany/blob/master/src/landIce/evaluators/LandIce_StokesFOResid_Def.hpp#L193-L209

jewatkins commented 6 months ago

Thanks @OscarAntepara ! So the issue was that writing to MDFields (e.g., Residual(cell,node,1)) is expensive on GPUs?

Right, it's better to write locally in the internal loop than write to device memory outside the loop. That might not be the case on CPU... Oscar could you check cpu performance?

Is it possible to refactor the code to avoid the duplication of this code? https://github.com/sandialabs/Albany/blob/master/src/landIce/evaluators/LandIce_StokesFOResid_Def.hpp#L193-L209

We can probably do that with an inline device function but it might get ugly. We thought by keeping the old implementation it might improve readability (similar to what we do for the optimized gradient).

OscarAntepara commented 6 months ago

Yeah, doing the accumulation locally improves data locality in the GPU which gives a better performance. I will check the cpu performance.

mperego commented 6 months ago

We can probably do that with an inline device function but it might get ugly. We thought by keeping the old implementation it might improve readability (similar to what we do for the optimized gradient)

OK, it's just that if we need to modify that computation kernel we have to do it in multiple places.

jewatkins commented 6 months ago

We can probably do that with an inline device function but it might get ugly. We thought by keeping the old implementation it might improve readability (similar to what we do for the optimized gradient)

OK, it's just that if we need to modify that computation kernel we have to do it in multiple places.

Right, probably okay to turn into an inline function then. I don't think it will be too complicated in terms of readability.

OscarAntepara commented 6 months ago

Thanks @OscarAntepara ! So the issue was that writing to MDFields (e.g., Residual(cell,node,1)) is expensive on GPUs?

Right, it's better to write locally in the internal loop than write to device memory outside the loop. That might not be the case on CPU... Oscar could you check cpu performance?

Is it possible to refactor the code to avoid the duplication of this code? https://github.com/sandialabs/Albany/blob/master/src/landIce/evaluators/LandIce_StokesFOResid_Def.hpp#L193-L209

We can probably do that with an inline device function but it might get ugly. We thought by keeping the old implementation it might improve readability (similar to what we do for the optimized gradient).

For the 16km test with CPUs there is not much difference between the original and the new code. Original: Phalanx: Evaluator 15: [Residual] StokesFOResid: 0.368748 - 32.183% [13] {min=0.358496, max=0.387898, std dev=0.00415609} Phalanx: Evaluator 86: [Jacobian] StokesFOResid: 1.13733 - 27.0871% [8] {min=1.11755, max=1.16131, std dev=0.0112813}

New: Phalanx: Evaluator 15: [Residual] StokesFOResid: 0.368064 - 32.1713% [13] {min=0.359196, max=0.383944, std dev=0.00349569} Phalanx: Evaluator 86: [Jacobian] StokesFOResid: 1.11119 - 26.5841% [8] {min=1.076, max=1.14491, std dev=0.0124654}

OscarAntepara commented 6 months ago

We can probably do that with an inline device function but it might get ugly. We thought by keeping the old implementation it might improve readability (similar to what we do for the optimized gradient)

OK, it's just that if we need to modify that computation kernel we have to do it in multiple places.

Right, probably okay to turn into an inline function then. I don't think it will be too complicated in terms of readability.

I have modified the code to avoid the code duplication mentioned before.

mperego commented 6 months ago

We can probably do that with an inline device function but it might get ugly. We thought by keeping the old implementation it might improve readability (similar to what we do for the optimized gradient)

OK, it's just that if we need to modify that computation kernel we have to do it in multiple places.

Right, probably okay to turn into an inline function then. I don't think it will be too complicated in terms of readability.

I have modified the code to avoid the code duplication mentioned before.

Thanks! I think it's a reasonable approach. Have you tested it already? I'm not sure if we still have some tests using Tets, in which case you would have to add a specialization for numNodes==4.

OscarAntepara commented 6 months ago

We can probably do that with an inline device function but it might get ugly. We thought by keeping the old implementation it might improve readability (similar to what we do for the optimized gradient)

OK, it's just that if we need to modify that computation kernel we have to do it in multiple places.

Right, probably okay to turn into an inline function then. I don't think it will be too complicated in terms of readability.

I have modified the code to avoid the code duplication mentioned before.

Thanks! I think it's a reasonable approach. Have you tested it already? I'm not sure if we still have some tests using Tets, in which case you would have to add a specialization for numNodes==4.

I just have tested it for the ant-16km that is just with numNodes=8. Is there a test for LandIce3D with numNodes=4? is that using just tetrahedrons?

mperego commented 6 months ago

We can probably do that with an inline device function but it might get ugly. We thought by keeping the old implementation it might improve readability (similar to what we do for the optimized gradient)

OK, it's just that if we need to modify that computation kernel we have to do it in multiple places.

Right, probably okay to turn into an inline function then. I don't think it will be too complicated in terms of readability.

I have modified the code to avoid the code duplication mentioned before.

Thanks! I think it's a reasonable approach. Have you tested it already? I'm not sure if we still have some tests using Tets, in which case you would have to add a specialization for numNodes==4.

I just have tested it for the ant-16km that is just with numNodes=8. Is there a test for LandIce3D with numNodes=4? is that using just tetrahedrons?

We used to have a lot, but e converted most of them to Wedges. I couldn't find one with a quick search. Can you simply run Albany landice ctests?

OscarAntepara commented 6 months ago

Albany landice ctests

Yeah, I got this:

Test project /pscratch/sd/o/oantepar/fanssie/builds/albany_pm_gpu_nouvm_gnu_sfad16 Start 1: unit_NullSpaceUtils_UnitTest_Serial 1/20 Test #1: unit_NullSpaceUtils_UnitTest_Serial ............................ Passed 9.43 sec Start 2: unit_NullSpaceUtils_UnitTest_Parallel 2/20 Test #2: unit_NullSpaceUtils_UnitTest_Parallel .......................... Passed 14.38 sec Start 3: unit_StringUtils_UnitTest 3/20 Test #3: unit_StringUtils_UnitTest ...................................... Passed 88.81 sec Start 4: unit_HessianVecFad_UnitTest 4/20 Test #4: unit_HessianVecFad_UnitTest .................................... Passed 24.46 sec Start 5: disc_stk_STKDisc_UnitTest_Serial 5/20 Test #5: disc_stk_STKDisc_UnitTest_Serial ............................... Passed 60.67 sec Start 6: disc_stk_STKDisc_UnitTest_Parallel 6/20 Test #6: disc_stk_STKDisc_UnitTest_Parallel ............................. Passed 80.63 sec Start 7: unit_evaluators_DOFInterpolation_UnitTest_Serial 7/20 Test #7: unit_evaluators_DOFInterpolation_UnitTest_Serial ............... Passed 35.56 sec Start 8: unit_evaluators_DOFInterpolation_UnitTest_Parallel 8/20 Test #8: unit_evaluators_DOFInterpolation_UnitTest_Parallel ............. Passed 29.89 sec Start 9: unit_evaluators_GatherSolution_UnitTest_Serial 9/20 Test #9: unit_evaluators_GatherSolution_UnitTest_Serial ................. Passed 29.02 sec Start 10: unit_evaluators_GatherSolution_UnitTest_Parallel 10/20 Test #10: unit_evaluators_GatherSolution_UnitTest_Parallel ............... Passed 31.00 sec Start 11: unit_evaluators_GatherDistributedParameter_UnitTest_Serial 11/20 Test #11: unit_evaluators_GatherDistributedParameter_UnitTest_Serial ..... Passed 31.92 sec Start 12: unit_evaluators_GatherDistributedParameter_UnitTest_Parallel 12/20 Test #12: unit_evaluators_GatherDistributedParameter_UnitTest_Parallel ... Passed 10.94 sec Start 13: unit_evaluators_ScatterResidual_UnitTest_Serial 13/20 Test #13: unit_evaluators_ScatterResidual_UnitTest_Serial ................Failed 82.82 sec Start 14: unit_evaluators_ScatterResidual_UnitTest_Parallel 14/20 Test #14: unit_evaluators_ScatterResidual_UnitTest_Parallel ..............Failed 47.41 sec Start 15: unit_evaluators_ScatterScalarResponse_UnitTest_Serial 15/20 Test #15: unit_evaluators_ScatterScalarResponse_UnitTest_Serial ..........Failed 13.53 sec Start 16: unit_evaluators_ScatterScalarResponse_UnitTest_Parallel 16/20 Test #16: unit_evaluators_ScatterScalarResponse_UnitTest_Parallel ........Failed 152.74 sec Start 17: LandIce_FO_Dome_Ascii 17/20 Test #17: LandIce_FO_Dome_Ascii .......................................... Passed 45.23 sec Start 18: LandIce_FO_Dome_Restart 18/20 Test #18: LandIce_FO_Dome_Restart ........................................ Passed 58.91 sec Start 19: landIce_FO_AIS_16km_decompMesh 19/20 Test #19: landIce_FO_AIS_16km_decompMesh .................................Failed 43.10 sec Start 20: landIce_FO_AIS_16km_MueLuKokkos Failed test dependencies: landIce_FO_AIS_16km_decompMesh 20/20 Test #20: landIce_FO_AIS_16km_MueLuKokkos ................................Not Run 0.00 sec

70% tests passed, 6 tests failed out of 20

Label Time Summary: Forward = 104.14 secproc (3 tests) LandIce = 104.14 secproc (3 tests) Serial = 104.14 secproc (2 tests) unit = 743.20 secproc (16 tests)

Total Test time (real) = 890.49 sec

The following tests FAILED: 13 - unit_evaluators_ScatterResidual_UnitTest_Serial (Failed) 14 - unit_evaluators_ScatterResidual_UnitTest_Parallel (Failed) 15 - unit_evaluators_ScatterScalarResponse_UnitTest_Serial (Failed) 16 - unit_evaluators_ScatterScalarResponse_UnitTest_Parallel (Failed) 19 - landIce_FO_AIS_16km_decompMesh (Failed) 20 - landIce_FO_AIS_16km_MueLuKokkos (Not Run) Errors while running CTest

mperego commented 6 months ago

Albany landice ctests

Yeah, I got this:

Test project /pscratch/sd/o/oantepar/fanssie/builds/albany_pm_gpu_nouvm_gnu_sfad16 Start 1: unit_NullSpaceUtils_UnitTest_Serial 1/20 Test #1: unit_NullSpaceUtils_UnitTest_Serial ............................ Passed 9.43 sec Start 2: unit_NullSpaceUtils_UnitTest_Parallel 2/20 Test #2: unit_NullSpaceUtils_UnitTest_Parallel .......................... Passed 14.38 sec Start 3: unit_StringUtils_UnitTest 3/20 Test #3: unit_StringUtils_UnitTest ...................................... Passed 88.81 sec Start 4: unit_HessianVecFad_UnitTest 4/20 Test #4: unit_HessianVecFad_UnitTest .................................... Passed 24.46 sec Start 5: disc_stk_STKDisc_UnitTest_Serial 5/20 Test #5: disc_stk_STKDisc_UnitTest_Serial ............................... Passed 60.67 sec Start 6: disc_stk_STKDisc_UnitTest_Parallel 6/20 Test #6: disc_stk_STKDisc_UnitTest_Parallel ............................. Passed 80.63 sec Start 7: unit_evaluators_DOFInterpolation_UnitTest_Serial 7/20 Test #7: unit_evaluators_DOFInterpolation_UnitTest_Serial ............... Passed 35.56 sec Start 8: unit_evaluators_DOFInterpolation_UnitTest_Parallel 8/20 Test #8: unit_evaluators_DOFInterpolation_UnitTest_Parallel ............. Passed 29.89 sec Start 9: unit_evaluators_GatherSolution_UnitTest_Serial 9/20 Test #9: unit_evaluators_GatherSolution_UnitTest_Serial ................. Passed 29.02 sec Start 10: unit_evaluators_GatherSolution_UnitTest_Parallel 10/20 Test #10: unit_evaluators_GatherSolution_UnitTest_Parallel ............... Passed 31.00 sec Start 11: unit_evaluators_GatherDistributedParameter_UnitTest_Serial 11/20 Test #11: unit_evaluators_GatherDistributedParameter_UnitTest_Serial ..... Passed 31.92 sec Start 12: unit_evaluators_GatherDistributedParameter_UnitTest_Parallel 12/20 Test #12: unit_evaluators_GatherDistributedParameter_UnitTest_Parallel ... Passed 10.94 sec Start 13: unit_evaluators_ScatterResidual_UnitTest_Serial 13/20 Test #13: unit_evaluators_ScatterResidual_UnitTest_Serial ................Failed 82.82 sec Start 14: unit_evaluators_ScatterResidual_UnitTest_Parallel 14/20 Test #14: unit_evaluators_ScatterResidual_UnitTest_Parallel ..............Failed 47.41 sec Start 15: unit_evaluators_ScatterScalarResponse_UnitTest_Serial 15/20 Test #15: unit_evaluators_ScatterScalarResponse_UnitTest_Serial ..........Failed 13.53 sec Start 16: unit_evaluators_ScatterScalarResponse_UnitTest_Parallel 16/20 Test #16: unit_evaluators_ScatterScalarResponse_UnitTest_Parallel ........Failed 152.74 sec Start 17: LandIce_FO_Dome_Ascii 17/20 Test #17: LandIce_FO_Dome_Ascii .......................................... Passed 45.23 sec Start 18: LandIce_FO_Dome_Restart 18/20 Test #18: LandIce_FO_Dome_Restart ........................................ Passed 58.91 sec Start 19: landIce_FO_AIS_16km_decompMesh 19/20 Test #19: landIce_FO_AIS_16km_decompMesh .................................Failed 43.10 sec Start 20: landIce_FO_AIS_16km_MueLuKokkos Failed test dependencies: landIce_FO_AIS_16km_decompMesh 20/20 Test #20: landIce_FO_AIS_16km_MueLuKokkos ................................Not Run 0.00 sec

70% tests passed, 6 tests failed out of 20

Label Time Summary: Forward = 104.14 sec_proc (3 tests) LandIce = 104.14 sec_proc (3 tests) Serial = 104.14 sec_proc (2 tests) unit = 743.20 sec_proc (16 tests)

Total Test time (real) = 890.49 sec

The following tests FAILED: 13 - unit_evaluators_ScatterResidual_UnitTest_Serial (Failed) 14 - unit_evaluators_ScatterResidual_UnitTest_Parallel (Failed) 15 - unit_evaluators_ScatterScalarResponse_UnitTest_Serial (Failed) 16 - unit_evaluators_ScatterScalarResponse_UnitTest_Parallel (Failed) 19 - landIce_FO_AIS_16km_decompMesh (Failed) 20 - landIce_FO_AIS_16km_MueLuKokkos (Not Run) Errors while running CTest

for 19 and 20, you probably need to put the trilinos libs in your LD_LIBRARY_PATH.

OscarAntepara commented 6 months ago

Albany landice ctests

Yeah, I got this:

Test project /pscratch/sd/o/oantepar/fanssie/builds/albany_pm_gpu_nouvm_gnu_sfad16 Start 1: unit_NullSpaceUtils_UnitTest_Serial 1/20 Test #1: unit_NullSpaceUtils_UnitTest_Serial ............................ Passed 9.43 sec Start 2: unit_NullSpaceUtils_UnitTest_Parallel 2/20 Test #2: unit_NullSpaceUtils_UnitTest_Parallel .......................... Passed 14.38 sec Start 3: unit_StringUtils_UnitTest 3/20 Test #3: unit_StringUtils_UnitTest ...................................... Passed 88.81 sec Start 4: unit_HessianVecFad_UnitTest 4/20 Test #4: unit_HessianVecFad_UnitTest .................................... Passed 24.46 sec Start 5: disc_stk_STKDisc_UnitTest_Serial 5/20 Test #5: disc_stk_STKDisc_UnitTest_Serial ............................... Passed 60.67 sec Start 6: disc_stk_STKDisc_UnitTest_Parallel 6/20 Test #6: disc_stk_STKDisc_UnitTest_Parallel ............................. Passed 80.63 sec Start 7: unit_evaluators_DOFInterpolation_UnitTest_Serial 7/20 Test #7: unit_evaluators_DOFInterpolation_UnitTest_Serial ............... Passed 35.56 sec Start 8: unit_evaluators_DOFInterpolation_UnitTest_Parallel 8/20 Test #8: unit_evaluators_DOFInterpolation_UnitTest_Parallel ............. Passed 29.89 sec Start 9: unit_evaluators_GatherSolution_UnitTest_Serial 9/20 Test #9: unit_evaluators_GatherSolution_UnitTest_Serial ................. Passed 29.02 sec Start 10: unit_evaluators_GatherSolution_UnitTest_Parallel 10/20 Test #10: unit_evaluators_GatherSolution_UnitTest_Parallel ............... Passed 31.00 sec Start 11: unit_evaluators_GatherDistributedParameter_UnitTest_Serial 11/20 Test #11: unit_evaluators_GatherDistributedParameter_UnitTest_Serial ..... Passed 31.92 sec Start 12: unit_evaluators_GatherDistributedParameter_UnitTest_Parallel 12/20 Test #12: unit_evaluators_GatherDistributedParameter_UnitTest_Parallel ... Passed 10.94 sec Start 13: unit_evaluators_ScatterResidual_UnitTest_Serial 13/20 Test #13: unit_evaluators_ScatterResidual_UnitTest_Serial ................Failed 82.82 sec Start 14: unit_evaluators_ScatterResidual_UnitTest_Parallel 14/20 Test #14: unit_evaluators_ScatterResidual_UnitTest_Parallel ..............Failed 47.41 sec Start 15: unit_evaluators_ScatterScalarResponse_UnitTest_Serial 15/20 Test #15: unit_evaluators_ScatterScalarResponse_UnitTest_Serial ..........Failed 13.53 sec Start 16: unit_evaluators_ScatterScalarResponse_UnitTest_Parallel 16/20 Test #16: unit_evaluators_ScatterScalarResponse_UnitTest_Parallel ........Failed 152.74 sec Start 17: LandIce_FO_Dome_Ascii 17/20 Test #17: LandIce_FO_Dome_Ascii .......................................... Passed 45.23 sec Start 18: LandIce_FO_Dome_Restart 18/20 Test #18: LandIce_FO_Dome_Restart ........................................ Passed 58.91 sec Start 19: landIce_FO_AIS_16km_decompMesh 19/20 Test #19: landIce_FO_AIS_16km_decompMesh .................................Failed 43.10 sec Start 20: landIce_FO_AIS_16km_MueLuKokkos Failed test dependencies: landIce_FO_AIS_16km_decompMesh 20/20 Test #20: landIce_FO_AIS_16km_MueLuKokkos ................................Not Run 0.00 sec

70% tests passed, 6 tests failed out of 20 Label Time Summary: Forward = 104.14 sec_proc (3 tests) LandIce = 104.14 sec_proc (3 tests) Serial = 104.14 sec_proc (2 tests) unit = 743.20 sec_proc (16 tests) Total Test time (real) = 890.49 sec The following tests FAILED: 13 - unit_evaluators_ScatterResidual_UnitTest_Serial (Failed) 14 - unit_evaluators_ScatterResidual_UnitTest_Parallel (Failed) 15 - unit_evaluators_ScatterScalarResponse_UnitTest_Serial (Failed) 16 - unit_evaluators_ScatterScalarResponse_UnitTest_Parallel (Failed) 19 - landIce_FO_AIS_16km_decompMesh (Failed) 20 - landIce_FO_AIS_16km_MueLuKokkos (Not Run) Errors while running CTest

for 19 and 20, you probably need to put the trilinos libs in your LD_LIBRARY_PATH.

Trueee, now I got this: Test project /pscratch/sd/o/oantepar/fanssie/builds/albany_pm_gpu_nouvm_gnu_sfad16 Start 1: unit_NullSpaceUtils_UnitTest_Serial 1/20 Test #1: unit_NullSpaceUtils_UnitTest_Serial ............................ Passed 13.21 sec Start 2: unit_NullSpaceUtils_UnitTest_Parallel 2/20 Test #2: unit_NullSpaceUtils_UnitTest_Parallel .......................... Passed 32.46 sec Start 3: unit_StringUtils_UnitTest 3/20 Test #3: unit_StringUtils_UnitTest ...................................... Passed 55.65 sec Start 4: unit_HessianVecFad_UnitTest 4/20 Test #4: unit_HessianVecFad_UnitTest .................................... Passed 39.80 sec Start 5: disc_stk_STKDisc_UnitTest_Serial 5/20 Test #5: disc_stk_STKDisc_UnitTest_Serial ............................... Passed 11.90 sec Start 6: disc_stk_STKDisc_UnitTest_Parallel 6/20 Test #6: disc_stk_STKDisc_UnitTest_Parallel ............................. Passed 5.44 sec Start 7: unit_evaluators_DOFInterpolation_UnitTest_Serial 7/20 Test #7: unit_evaluators_DOFInterpolation_UnitTest_Serial ............... Passed 15.41 sec Start 8: unit_evaluators_DOFInterpolation_UnitTest_Parallel 8/20 Test #8: unit_evaluators_DOFInterpolation_UnitTest_Parallel ............. Passed 16.10 sec Start 9: unit_evaluators_GatherSolution_UnitTest_Serial 9/20 Test #9: unit_evaluators_GatherSolution_UnitTest_Serial ................. Passed 7.26 sec Start 10: unit_evaluators_GatherSolution_UnitTest_Parallel 10/20 Test #10: unit_evaluators_GatherSolution_UnitTest_Parallel ............... Passed 62.92 sec Start 11: unit_evaluators_GatherDistributedParameter_UnitTest_Serial 11/20 Test #11: unit_evaluators_GatherDistributedParameter_UnitTest_Serial ..... Passed 60.16 sec Start 12: unit_evaluators_GatherDistributedParameter_UnitTest_Parallel 12/20 Test #12: unit_evaluators_GatherDistributedParameter_UnitTest_Parallel ... Passed 41.62 sec Start 13: unit_evaluators_ScatterResidual_UnitTest_Serial 13/20 Test #13: unit_evaluators_ScatterResidual_UnitTest_Serial ................Failed 12.05 sec Start 14: unit_evaluators_ScatterResidual_UnitTest_Parallel 14/20 Test #14: unit_evaluators_ScatterResidual_UnitTest_Parallel ..............Failed 29.00 sec Start 15: unit_evaluators_ScatterScalarResponse_UnitTest_Serial 15/20 Test #15: unit_evaluators_ScatterScalarResponse_UnitTest_Serial ..........Failed 51.83 sec Start 16: unit_evaluators_ScatterScalarResponse_UnitTest_Parallel 16/20 Test #16: unit_evaluators_ScatterScalarResponse_UnitTest_Parallel ........Failed 8.81 sec Start 17: LandIce_FO_Dome_Ascii 17/20 Test #17: LandIce_FO_Dome_Ascii .......................................... Passed 11.40 sec Start 18: LandIce_FO_Dome_Restart 18/20 Test #18: LandIce_FO_Dome_Restart ........................................ Passed 9.23 sec Start 19: landIce_FO_AIS_16km_decompMesh 19/20 Test #19: landIce_FO_AIS_16km_decompMesh ................................. Passed 22.59 sec Start 20: landIce_FO_AIS_16km_MueLuKokkos 20/20 Test #20: landIce_FO_AIS_16km_MueLuKokkos ................................ Passed 14.41 sec

80% tests passed, 4 tests failed out of 20

Label Time Summary: Forward = 35.04 secproc (3 tests) LandIce = 35.04 secproc (3 tests) Serial = 20.63 secproc (2 tests) unit = 463.64 secproc (16 tests)

Total Test time (real) = 521.31 sec

The following tests FAILED: 13 - unit_evaluators_ScatterResidual_UnitTest_Serial (Failed) 14 - unit_evaluators_ScatterResidual_UnitTest_Parallel (Failed) 15 - unit_evaluators_ScatterScalarResponse_UnitTest_Serial (Failed) 16 - unit_evaluators_ScatterScalarResponse_UnitTest_Parallel (Failed) Errors while running CTest

mperego commented 6 months ago

OK, I don't think that your changes are affecting the unit tests, so I think you are good to go.

In case you want to understand what's going on, you can run single tests with verbose output doing: ctest -VV -R unit_evaluators_ScatterResidual_UnitTest_Serial

OscarAntepara commented 6 months ago

If people are curious, I'm seeing the same errors that you have here: https://my.cdash.org/viewTest.php?buildid=2560428

mperego commented 6 months ago

Thanks, we should open an issue about those tests failing.

mcarlson801 commented 6 months ago

Thanks, we should open an issue about those tests failing.

Those tests are currently not expected to pass since these are uvm-free builds. If you want I can start an issue that tracks the status of uvm-free tests and which are currently known to fail.

mperego commented 6 months ago

Thanks, we should open an issue about those tests failing.

Those tests are currently not expected to pass since these are uvm-free builds. If you want I can start an issue that tracks the status of uvm-free tests and which are currently known to fail.

Oh, OK. Should we disable them in UVM-free builds? Anyway, I'm fine either way, and it's OK to do nothing if you plan to make these tests work in UVM-free builds.

jewatkins commented 5 months ago

Frontier numbers from Oscar: Original:

Residual: 4ms
Jacobian: 71ms

New:

Residual: 1ms
Jacobian: 44ms