openpmix / pmix-tests

OpenPMIx Community Testing Infrastructure
Other
1 stars 9 forks source link

PMIx Fence: single-job partial barrier #72

Open artpol84 opened 3 years ago

artpol84 commented 3 years ago

Test description

Verifies that the partial Fence is properly working

Test sketch

#include "pmix.h"

double max_fence_time()
{
    double fence_time = 0;
    int i;

    /* Measure the typical fence execution time */
    for(i = 0; i < 100; i++) {
        ts1 = timestamp();
        PMIx_Fence(without_data_collection);
        ts2 = timestamp();
        fence_time = max(fence_time, ts2 - ts1);
    }
    return fence_time;
}

int main() 
{
    double timeout, fence_time;

    PMIx_Init();

    fence_time = max_fence_time();
    T = Ratio * fence_time; // Ratio might be 100, should be selected for the particular system

    if( rank == 1){
        sleep(T);
    }
    if( rank % 2 ){
        ts1 = timestamp();
        PMIx_Fence(without_data_collection, only-odd-procs);
        ts2 = timestamp();
        // Odd ranks should not be affected by the rank = 1 delay
        assert( (t2 - t1) ~ fence_time);
    }
    ts1 = timestamp();
    PMIx_Fence(without_data_collection);
    ts2 = timestamp();

    if(rank != 1) {
        assert( (t2 - t1) ~ T);
    } else {
        assert( (t2 - t1) ~ fence_time);
    }
    PMIx_Finalize();
}

Execution details

Client-side expectations:

  1. All PMIx calls return PMIX_SUCCESS
  2. All ranks (except rank=0) experience Fence timeout.

Server-side expectations:

  1. N invocations of:
    • client_connected
    • client_finalized
  2. Verify, that proc structure was set to the individual ranks.
  3. 2 Fence callback invocation with WILDCARD.
  4. Distance between Fence's on node0 is > T
  5. Starting from "https://github.com/openpmix/openpmix/pull/1135" the size of Fence should be 0B.
  6. No other callbacks are called (no direct modex requests) (? Any event-related activity?)

Reference implementation:

TBD

Notes

The test suite's RTE component should implement the support for multiple in-flight Fence's. Currently not supported.

jjhursey commented 3 years ago

Do you need a if( ! rank % 2 ){ after the second fence to account for the additional delay that those processes not participating in the first fence will see in the start of the synchronization in the second fence? Something like

if( rank % 2 ){
  // Shouldn't see any additional synchronization delay in second fence
} else
  // Account for synchronization delay from other ranks participating in the first fence.
artpol84 commented 3 years ago

The idea is that fence_time is negligible compared to T. it's like O(T) where O(T) ~ O(fence_time + T)

cpshereda commented 3 years ago

See https://github.com/openpmix/openpmix/pull/2327.