martinruefenacht / lemonspotter

MPI Test Generation Framework
MIT License
1 stars 0 forks source link

Start-End-Generator producing tests that fail runtime success #19

Closed martinruefenacht closed 4 years ago

martinruefenacht commented 5 years ago

The error code returned is "139".

MPI_Init_MPI_Finalize tests are correctly succeeding. (sometimes, see #23)

MPI_Init_thread_MPI_Finalize tests are not for the most part.

carsonwoods commented 5 years ago

After investigating, the bug appears to originate in the parameters being passed into the MPI_Init_thread() call.

I chose a random failing test to try and resolve manually. The test was failing on OpenMPI 3.1.2. The code of the failing test was

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

int main(int argument_count, char **argument_list)
{
    int* argument_count_arg_NULL = NULL;
    char*** argument_list_arg_NULL = NULL;
    int required_arg_MPI_THREAD_SINGLE = MPI_THREAD_SINGLE;

    int* provided_out;

    // start point for start-end test
    int return_MPI_Init_thread = MPI_Init_thread(argument_count_arg_NULL, argument_list_arg_NULL, required_arg_MPI_THREAD_SINGLE, provided_out);

    printf("return_MPI_Init_thread %i\n", return_MPI_Init_thread);
    printf("argument_count_arg_NULL %p\n", argument_count_arg_NULL);
    printf("argument_list_arg_NULL %p\n", argument_list_arg_NULL);
    printf("provided_out %p\n", provided_out);

    if(return_MPI_Init_thread != MPI_SUCCESS)
    {
        exit(return_MPI_Init_thread);
    }

    // end point for start-end test
    int return_MPI_Finalize = MPI_Finalize();

    printf("return_MPI_Finalize %i\n", return_MPI_Finalize);

    if(return_MPI_Finalize != MPI_SUCCESS)
    {
        exit(return_MPI_Finalize);
    }

    return 0;
}

The program would compile, but running mpiexec on the executable manually created the following error:

-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 0 on node whitwell exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

By manually changing the int* provided_out to int provided_out and the function call to int return_MPI_Init_thread = MPI_Init_thread(argument_count_arg_NULL, argument_list_arg_NULL, required_arg_MPI_THREAD_SINGLE, &provided_out); I was able to get no compilation errors and no runtime errors.

I believe this can be fixed by changing the parameter in the database for the MPI_Init_thread function in the MPI database, however before doing that I wanted to validate my fix on a different system. Upon testing this on my Macbook Pro running MPICH 3.2.1 the original error never even occurred. The original generated test passed without issue. Just for the sake of testing I went ahead and manually applied the same fix to my Macbook's test and it still compiled and run.

Summary

The specified type for the provided_out parameter in MPI_Init_thread is causing issues on some MPI implementations. It can be fixed by removing the pointer from the variable declaration and dereferencing the variable in the method call as previously shown. Either way this is a discrepancy in MPI implementations. If OpenMPI is correct, then we have a bug and should fix it by modifying our database of MPI functions. If MPICH is correct then we might have found a bug in OpenMPI and we might not necessarily have to make a change on our end.

Code Changes

Existing Code

    int* provided_out;

    // start point for start-end test
    int return_MPI_Init_thread = MPI_Init_thread(argument_count_arg_NULL, argument_list_arg_NULL, required_arg_MPI_THREAD_SINGLE, provided_out);

Bug Fix

    int provided_out;

    // start point for start-end test
    int return_MPI_Init_thread = MPI_Init_thread(argument_count_arg_NULL, argument_list_arg_NULL, required_arg_MPI_THREAD_SINGLE, &provided_out);
carsonwoods commented 5 years ago

Upon further investigation it appears that MPI generally expects a reference (rather than a pointer) on parameters that have an out direction. I no longer think that this would be a database change, but rather a change in the sampler.

martinruefenacht commented 5 years ago

Yes, the actual database is correct I think. But the way we generate a variable is not. We would need to create an int and then take a pointer to it and pass that pointer... Or the shorthand of & directly in the argument list. For explicitness I would prefer doing the two variables (one being a pointer) approach. (Explicit is better than implicit)

Does this solve the 136 error we used to get?

carsonwoods commented 5 years ago

Yes this did solve the error on my machine.

carsonwoods commented 5 years ago

Ok after testing manually, your proposed explicit fix does seem to resolve the problem. I'll start implementing a fix.

carsonwoods commented 4 years ago

Fix is ready, however it might not actually be our bug.