idaholab / moose

Multiphysics Object Oriented Simulation Environment
https://www.mooseframework.org
GNU Lesser General Public License v2.1
1.72k stars 1.04k forks source link

Thread failures on Mac testing with 12.3 Monterey #20680

Open cticenhour opened 2 years ago

cticenhour commented 2 years ago

Bug Description

Test failures are occurring in framework and BISON with the newest conda packages when threads are used. The failed tests are generally numerous (7+), but intermittent in nature and occurrence.

Steps to Reproduce

  1. Install mambaforge3 and install moose packages on a macOS machine running 12.3 (Monterey) and the newest XCode (13.3)
  2. Run the following:
    cd ~/projects/moose/modules/heat_conduction
    make -j8
    ./run_tests -j8 --n-threads=2

Sample test failure summary for the heat conduction module:

    FAIL FAILED (CRASH) test:view_factors.unnormalized
    FAIL FAILED (CRASH) test:radiation_transfer_action.external_boundary_analytical
    FAIL FAILED (CRASH) test:gray_lambert_radiator.gray_lambert_cavity_automatic_vf
    FAIL FAILED (CRASH) test:gray_lambert_radiator.gray_lambert_cavity_automatic_vf_3D
    FAIL FAILED (CRASH) test:gap_heat_transfer_htonly.cyl3D
    DIFF FAILED (EXODIFF) test:recover.recover_1
    DIFF FAILED (EXODIFF) test:recover.ad_recover_1
    DIFF FAILED (CSVDIFF) test:gap_heat_transfer_mortar.large_gap_heat_transfer_test_cylinder_mortar
    DIFF FAILED (EXODIFF) test:gap_heat_transfer_htonly.sphere3D
    DIFF FAILED (CSVDIFF) test:gap_heat_transfer_mortar.gap_heat_transfer_3d
    DIFF FAILED (CSVDIFF) test:gap_heat_transfer_mortar.gap_heat_transfer_3d_hex20
--------------------------------------------------------------------------------------------------------------
Ran 177 tests in 43.5 seconds. Average test time 0.7 seconds, maximum test time 4.4 seconds.
166 passed, 18 skipped, 0 pending, 11 FAILED

See an example of the BISON failure here: https://moosebuild.hpc.inl.gov/job/431642/

Impact

This breaks BISON devel testing (since they test explicitly with threads on Mac) and can leave users who rely on threads in the framework and apps in a potential broken or ill-functioning state.

cticenhour commented 2 years ago

While it seemed as though increasing the version for the minimum supported macOS fixed this problem for me locally, this doesn't seem to be the case in production (see https://moosebuild.hpc.inl.gov/job/434676/). Further, when building the entire conda stack locally today (Monterey 12.3, Xcode 13.3 [13E113]) and testing on my local machine, I am still seeing failures. It looks like my local build last week was a lucky fluke (which makes no sense to me) given the extreme intermittency of these failures. Still work to be done here.

milljm commented 1 year ago

Perusing through our issues, ran across this. Still a thing 😢

    DIFF test:recover.recover_1 FAILED (EXODIFF)
    DIFF test:recover.ad_recover_1 FAILED (EXODIFF)
    DIFF test:gap_heat_transfer_htonly.sphere3D FAILED (EXODIFF)
    DIFF test:gap_heat_transfer_mortar.large_gap_heat_transfer_test_cylinder_mortar_auto FAILED (CSVDIFF)
    DIFF test:gap_heat_transfer_mortar.large_gap_heat_transfer_test_2d_sphere_mortar FAILED (CSVDIFF)
    DIFF test:gap_heat_transfer_mortar.gap_heat_transfer_3d FAILED (CSVDIFF)
    DIFF test:gap_heat_transfer_mortar.gap_heat_transfer_3d_hex20 FAILED (CSVDIFF)
    DIFF test:gap_heat_transfer_mortar.gap_heat_transfer_3d_mortar_hex20 FAILED (CSVDIFF)
    DIFF test:gap_heat_transfer_mortar.gap_heat_transfer_sphere3d FAILED (CSVDIFF)
    FAIL test:gap_heat_transfer_htonly.cyl3D FAILED (CRASH)
--------------------------------------------------------------------------------------------------------------
Ran 233 tests in 52.0 seconds. Average test time 0.6 seconds, maximum test time 3.9 seconds.
223 passed, 10 skipped, 0 pending, 10 FAILED