sstsimulator / sst-elements

SST Architectural Simulation Components and Libraries
http://www.sst-simulator.org
Other
93 stars 121 forks source link

Ember OTF2 linking and Bcast fix (with an added test) #2392

Closed freund closed 1 month ago

freund commented 3 months ago

Closes #2390

Fixes the issue with dynamic linking to libotf2.so caused by a minor typo in ember's Makefile.am.

Closes #2391

Fixes the issue of the simulation hanging when the Ember OTF2 motif encounters an MPI_Bcast when replaying a trace. When the OTF2 library reads a trace and encounters a collective operation, it returns the sizes of both the sent and received messages. The motif then replicates this operation in Ember. However, it was issuing a Bcast using the number of sent bytes, which is incorrect because the root rank is the only one sending anything. (For non-root ranks, the number of sent bytes is zero.) It should use the number of received bytes, which is the same for all ranks. I think this discrepancy was causing the simulation to hang because Bcast operations did not match between the different ranks.

Added Ember OTF2 test

I also added a simple test to make sure that the Ember OTF2 motif works correctly. It reads in a small trace (just containing a Bcast across four ranks) and passes as long as the simulation completes within 20 seconds with the correct simulation time.

In order to skip this test if sst-elements was not built with OTF2 support, I added a couple lines to ember's Makefile.am to have sst-register add a flag to the configuration file indicating whether it was built with OTF2. The test first checks for that flag (which depends on sstsimulator/sst-core#1121). Please let me know if it's not advisable to pollute the config file in this way! The test will be skipped if sst-elements is found not to have been built with OTF2 support according to the sst-elements config include file.

This test also depends on being able to read in a trace, which comprises 10 small binary files that I've included here. I see that the sst-downloads repo has a release for test support files, so if it's preferable, I'd be happy to adjust my test to download an archive from there instead (it looks like cramSim does something similar). I just wasn't sure what the process of adding files to sst-downloads looked like.

Added support for missing collectives

I added support for Scatter, Allgather, and Alltoall collective operations. They were previously ignored when encountered in an OTF2 trace.

Added parameter for optional compute time

I added a new parameter addCompute. It is false by default, but when set to true, the motif will add ember compute events to emulate compute time spent outside of communication. The motif tracks the timestamps of events in the OTF2 trace, beginning with the start of the trace, at the start and end of MPI calls, and at the end of the trace. Using this information along with the time resolution read from the trace's global definitions, the motif can add compute events to the queue that roughly match the elapsed time outside of MPI calls.

sst-autotester commented 3 months ago

Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging NO INSPECTION HAS BEEN PERFORMED ON THIS PULL REQUEST! - This PR must be inspected by setting label 'AT: PRE-TEST INSPECTED'.

sst-autotester commented 1 month ago

Status Flag 'Pre-Test Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED by label AT: PRE-TEST INSPECTED! Autotester is Removing Label; this inspection will remain valid until a new commit to source branch is performed.

sst-autotester commented 1 month ago

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements

  • Build Num: 1628
  • Status: STARTED

Build Information

Test Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_Make-Dist

  • Build Num: 1065
  • Status: STARTED

Build Information

Test Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_MT-2

  • Build Num: 1609
  • Status: STARTED

Build Information

Test Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_MR-2

  • Build Num: 1609
  • Status: STARTED

Build Information

Test Name: SST__AutotestGen2_NewFW_OSX-14-XC15-ARM2_OMPI-4.1.6_PY3.10_sst-elements

  • Build Num: 181
  • Status: STARTED
## Using Repos:
Repo: ELEMENTS (freund/sst-elements)
  • Branch: ember-otf2-lib-and-bcast-fix
  • SHA: 7c766ce06216452a04c01378658c41116499da34
  • Mode: TEST_REPO
Repo: SQE (sstsimulator/sst-sqe)
  • Branch: devel
  • SHA: 2574c98896598820227190149834172b962dc3fc
  • Mode: SUPPORT_REPO
Repo: CORE (sstsimulator/sst-core)
  • Branch: devel
  • SHA: 1cc35cc85ae17a2f9a38c669bf7f00d2ed7f9a93
  • Mode: SUPPORT_REPO
Repo: MACRO (sstsimulator/sst-macro)
  • Branch: devel
  • SHA: 50a62170b3681ea20cc2f56abd2eb3911053f1fc
  • Mode: SUPPORT_REPO
Pull Request Author: freund

sst-autotester commented 1 month ago

Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED

Note: Testing will normally be attempted again in approx. 4 Hrs. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run.

Pull Request Auto Testing has FAILED (click to expand)

Job: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements

  • Result: PASSED
  • Build #: 1628
  • URL: Jenkins server at https://sst-jenkins.sandia.gov/view/SST/job/SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements/1628/consoleFull

Job: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_Make-Dist

  • Result: PASSED
  • Build #: 1065
  • URL: Jenkins server at https://sst-jenkins.sandia.gov/view/SST/job/SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_Make-Dist/1065/consoleFull

Job: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_MT-2

  • Result: PASSED
  • Build #: 1609
  • URL: Jenkins server at https://sst-jenkins.sandia.gov/view/SST/job/SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_MT-2/1609/consoleFull

Job: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_MR-2

  • Result: FAILED
  • Build #: 1609
  • URL: Jenkins server at https://sst-jenkins.sandia.gov/view/SST/job/SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_MR-2/1609/consoleFull
  • Job: - Status: FAILURE

Test Results

Test Name Status
test_Ariel_test_snb SKIPPED
test_Ariel_test_snb_mlm SKIPPED
test_Checkpoint_Module SKIPPED
test_MemPool_overflow SKIPPED
test_RealTime_SIGINT SKIPPED
test_RealTime_SIGTERM SKIPPED
test_RealTime_SIGUSR1_heartbeat FAILED
test_Sirius_Zodiac_27 SKIPPED
test_cacheTracer_2 SKIPPED
test_Ember_OTF2 SKIPPED
test_miranda_randomgen SKIPPED
test_miranda_singlestream SKIPPED
test_rdmaNic_short_tests_001_app_rdma_msg SKIPPED
test_rdmaNic_short_tests_002_app_mpi_IMB_MPI1 SKIPPED
test_sstexternalelement_001 SKIPPED
test_vanadis_short_tests_001_small_basic_io_lseek_riscv64_ SKIPPED
test_vanadis_short_tests_002_small_basic_io_hello_world_mipsel_ SKIPPED
test_vanadis_short_tests_003_small_basic_io_hello_world_riscv64_ SKIPPED
test_vanadis_short_tests_004_small_basic_io_hello_world_cpp_mipsel_ SKIPPED
test_vanadis_short_tests_005_small_basic_io_hello_world_cpp_riscv64_ SKIPPED
test_vanadis_short_tests_006_small_basic_io_printf_check_mipsel_ SKIPPED
test_vanadis_short_tests_007_small_basic_io_printf_check_riscv64_ SKIPPED
test_vanadis_short_tests_008_small_basic_io_openat_mipsel_ SKIPPED
test_vanadis_short_tests_009_small_basic_io_openat_riscv64_ SKIPPED
test_vanadis_short_tests_010_small_basic_io_read_write_mipsel_ SKIPPED
test_vanadis_short_tests_011_small_basic_io_read_write_riscv64_ SKIPPED
test_vanadis_short_tests_012_small_basic_io_unlink_mipsel_ SKIPPED
test_vanadis_short_tests_013_small_basic_io_unlink_riscv64_ SKIPPED
test_vanadis_short_tests_014_small_basic_io_unlinkat_mipsel_ SKIPPED
test_vanadis_short_tests_015_small_basic_io_unlinkat_riscv64_ SKIPPED
test_vanadis_short_tests_016_small_basic_io_fread_fwrite_mipsel_ SKIPPED
test_vanadis_short_tests_017_small_basic_io_fread_fwrite_riscv64_ SKIPPED
test_vanadis_short_tests_018_small_basic_math_sqrt_double_mipsel_ SKIPPED
test_vanadis_short_tests_019_small_basic_math_sqrt_double_riscv64_ SKIPPED
test_vanadis_short_tests_020_small_basic_math_sqrt_float_mipsel_ SKIPPED
test_vanadis_short_tests_021_small_basic_math_sqrt_float_riscv64_ SKIPPED
test_vanadis_short_tests_022_small_basic_ops_test_branch_mipsel_ SKIPPED
test_vanadis_short_tests_023_small_basic_ops_test_branch_riscv64_ SKIPPED
test_vanadis_short_tests_024_small_basic_ops_test_shift_mipsel_ SKIPPED
test_vanadis_short_tests_025_small_basic_ops_test_shift_riscv64_ SKIPPED
test_vanadis_short_tests_026_small_misc_stream_mipsel_ SKIPPED
test_vanadis_short_tests_027_small_misc_stream_riscv64_ SKIPPED
test_vanadis_short_tests_028_small_misc_gettime_mipsel_ SKIPPED
test_vanadis_short_tests_029_small_misc_gettime_riscv64_ SKIPPED
test_vanadis_short_tests_030_small_misc_splitLoad_mipsel_ SKIPPED
test_vanadis_short_tests_031_small_misc_splitLoad_riscv64_ SKIPPED
test_vanadis_short_tests_032_small_misc_mt_dgemm_mipsel_ SKIPPED
test_vanadis_short_tests_033_small_misc_mt_dgemm_riscv64_ SKIPPED
test_vanadis_short_tests_034_small_misc_stream_fortran_mipsel_ SKIPPED
test_vanadis_short_tests_035_small_misc_stream_fortran_riscv64_ SKIPPED
test_vanadis_short_tests_036_small_misc_uname_mipsel_ SKIPPED
test_vanadis_short_tests_037_small_misc_uname_riscv64_ SKIPPED
test_vanadis_short_tests_038_small_misc_fork_mipsel_gold1 SKIPPED
test_vanadis_short_tests_039_small_misc_fork_mipsel_gold2 SKIPPED
test_vanadis_short_tests_040_small_misc_fork_riscv64_gold1 SKIPPED
test_vanadis_short_tests_041_small_misc_fork_riscv64_gold2 SKIPPED
test_vanadis_short_tests_042_small_misc_clone_mipsel_gold1 SKIPPED
test_vanadis_short_tests_043_small_misc_clone_mipsel_gold2 SKIPPED
test_vanadis_short_tests_044_small_misc_clone_riscv64_gold1 SKIPPED
test_vanadis_short_tests_045_small_misc_clone_riscv64_gold2 SKIPPED
test_vanadis_short_tests_046_small_misc_pthread_mipsel_gold1 SKIPPED
test_vanadis_short_tests_047_small_misc_pthread_mipsel_gold2 SKIPPED
test_vanadis_short_tests_048_small_misc_pthread_riscv64_gold1 SKIPPED
test_vanadis_short_tests_049_small_misc_pthread_riscv64_gold2 SKIPPED
test_vanadis_short_tests_050_small_misc_openmp_mipsel_4core SKIPPED
test_vanadis_short_tests_051_small_misc_openmp_mipsel_4thread SKIPPED
test_vanadis_short_tests_052_small_misc_openmp_mipsel_2core_2thread SKIPPED
test_vanadis_short_tests_053_small_misc_openmp_riscv64_4core SKIPPED
test_vanadis_short_tests_054_small_misc_openmp_riscv64_4thread SKIPPED
test_vanadis_short_tests_055_small_misc_openmp_riscv64_2core_2thread SKIPPED
test_vanadis_short_tests_056_small_misc_openmp2_riscv64_16core SKIPPED
test_vanadis_short_tests_057_small_misc_openmp2_riscv64_32thread SKIPPED
test_vanadis_short_tests_058_small_misc_openmp2_riscv64_4core_8thread SKIPPED

Job: SST__AutotestGen2_NewFW_OSX-14-XC15-ARM2_OMPI-4.1.6_PY3.10_sst-elements

  • Result: FAILED
  • Build #: 181
  • URL: Jenkins server at https://sst-jenkins.sandia.gov/view/SST/job/SST__AutotestGen2_NewFW_OSX-14-XC15-ARM2_OMPI-4.1.6_PY3.10_sst-elements/181/consoleFull
  • Job: - Status: FAILURE

Test Results

Test Name Status
test_Ember_OTF2 SKIPPED
test_memHSieve SKIPPED
test_merlin_polarfly_455 SKIPPED
test_merlin_polarstar_504 SKIPPED
test_prospero_binary_using_PIN_traces SKIPPED
test_prospero_binary_withtimingdram_using_PIN_traces SKIPPED
test_prospero_text_using_PIN_traces SKIPPED
test_prospero_text_withtimingdram_using_PIN_traces SKIPPED
test_rdmaNic_short_tests_001_app_rdma_msg SKIPPED
test_rdmaNic_short_tests_002_app_mpi_IMB_MPI1 SKIPPED
test_sstinfo_coretestelement FAILED
test_sstinfo_interactive SKIPPED
sst-autotester commented 1 month ago

Status Flag 'Pull Request AutoTester' - Failure: Timed out waiting for job SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_Make-Dist to start: Total Wait = 303

sst-autotester commented 1 month ago

Status Flag 'Pull Request AutoTester' - Failure: Timed out waiting for job SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_Make-Dist to start: Total Wait = 303

sst-autotester commented 1 month ago

Status Flag 'Pull Request AutoTester' - Failure: Timed out waiting for job SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_Make-Dist to start: Total Wait = 303

sst-autotester commented 1 month ago

Status Flag 'Pull Request AutoTester' - Failure: Timed out waiting for job SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_Make-Dist to start: Total Wait = 303

sst-autotester commented 1 month ago

Status Flag 'Pull Request AutoTester' - Failure: Timed out waiting for job SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_Make-Dist to start: Total Wait = 303

sst-autotester commented 1 month ago

Status Flag 'Pull Request AutoTester' - Failure: Timed out waiting for job SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_Make-Dist to start: Total Wait = 303

sst-autotester commented 1 month ago

Status Flag 'Pull Request AutoTester' - Failure: Timed out waiting for job SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_Make-Dist to start: Total Wait = 303

sst-autotester commented 1 month ago

Status Flag 'Pull Request AutoTester' - Failure: Timed out waiting for job SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_Make-Dist to start: Total Wait = 303

sst-autotester commented 1 month ago

Status Flag 'Pull Request AutoTester' - Failure: Timed out waiting for job SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_Make-Dist to start: Total Wait = 303

sst-autotester commented 1 month ago

Status Flag 'Pull Request AutoTester' - Failure: Timed out waiting for job SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_Make-Dist to start: Total Wait = 303

sst-autotester commented 1 month ago

Status Flag 'Pull Request AutoTester' - Failure: Timed out waiting for job SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_Make-Dist to start: Total Wait = 303

berquist commented 1 month ago

Marking as WIP until we clean out the Autotester queue.

sst-autotester commented 1 month ago

Status Flag 'Pre-Test Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED by label AT: PRE-TEST INSPECTED! Autotester is Removing Label; this inspection will remain valid until a new commit to source branch is performed.

sst-autotester commented 1 month ago

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements

  • Build Num: 1675
  • Status: STARTED

Build Information

Test Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_Make-Dist

  • Build Num: 1078
  • Status: STARTED

Build Information

Test Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_MT-2

  • Build Num: 1633
  • Status: STARTED

Build Information

Test Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_MR-2

  • Build Num: 1633
  • Status: STARTED

Build Information

Test Name: SST__AutotestGen2_NewFW_OSX-14-XC15-ARM2_OMPI-4.1.6_PY3.10_sst-elements

  • Build Num: 205
  • Status: STARTED
## Using Repos:
Repo: ELEMENTS (freund/sst-elements)
  • Branch: ember-otf2-lib-and-bcast-fix
  • SHA: 6b5143c3901b4349a43c2a3f545a3805f52733e4
  • Mode: TEST_REPO
Repo: SQE (sstsimulator/sst-sqe)
  • Branch: devel
  • SHA: 2574c98896598820227190149834172b962dc3fc
  • Mode: SUPPORT_REPO
Repo: CORE (sstsimulator/sst-core)
  • Branch: devel
  • SHA: 593017605bfef08f493ce894449843ef8c5d56a8
  • Mode: SUPPORT_REPO
Repo: MACRO (sstsimulator/sst-macro)
  • Branch: devel
  • SHA: 50a62170b3681ea20cc2f56abd2eb3911053f1fc
  • Mode: SUPPORT_REPO
Pull Request Author: freund

sst-autotester commented 1 month ago

Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED

Pull Request Auto Testing has PASSED (click to expand)

Build Information

Test Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements

  • Build Num: 1675
  • Status: PASSED

Build Information

Test Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_Make-Dist

  • Build Num: 1078
  • Status: PASSED

Build Information

Test Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_MT-2

  • Build Num: 1633
  • Status: PASSED

Build Information

Test Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_MR-2

  • Build Num: 1633
  • Status: PASSED

Build Information

Test Name: SST__AutotestGen2_NewFW_OSX-14-XC15-ARM2_OMPI-4.1.6_PY3.10_sst-elements

  • Build Num: 205
  • Status: PASSED
sst-autotester commented 1 month ago

Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ feldergast ]!

sst-autotester commented 1 month ago

Status Flag 'Pull Request AutoTester' - Pull Request will be Automerged

sst-autotester commented 1 month ago

Merge on Pull Request# 2392: IS A SUCCESS - Pull Request successfully merged