ornladios / ADIOS

The old ADIOS 1.x code repository. Look for ADIOS2 for new repo
https://csmd.ornl.gov/adios
Other
54 stars 41 forks source link

zfp: Cannot allocate memory in buffer_write #180

Open jkelling opened 6 years ago

jkelling commented 6 years ago

I encountered a problem when attempting to use the zfp transform in Adios 1.13.1 . When trying to write larger amounts of data an error is printed and the program crashes with a SIGSEGV immediately afterwards.

Below you find a minimal example, which works if the "identity" transform is used instead of zfp.

Expected behavior

The program runs to completion and writes to test.bp.

Encountered behavior

Depending on the total amount of data written and the size of the written variables the following error message is printed:

Cannot allocate memory in buffer_write.  Requested: 36783836, Maximum: 36777876
ERROR: Cannot allocate shared buffer of 21875024 bytes for ZFP transform for variable data/95

At which point it fails depends on the size of the variables but there is no monotonous relation:

$ zfpExample 5000000
[...]
ERROR: Cannot allocate shared buffer of 21875024 bytes for ZFP transform for variable data/95
$ zfpExample 4000000
[...]
ERROR: Cannot allocate shared buffer of 17500024 bytes for ZFP transform for variable data/0

However, for smaller variables it is more likely, that more variables can be written before crashing, or the program might even complete, for example zfpExample 400000.

After this message the program segfaults, with the following beacktrace:

#0  __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:37
#1  0x0000000000483230 in adios_transform_zfp_apply ()
#2  0x0000000000481a06 in adios_transform_apply ()
#3  0x0000000000480853 in adios_transform_variable_data ()
#4  0x0000000000420c53 in common_adios_write_transform_helper ()
#5  0x000000000042121c in common_adios_write ()
#6  0x00000000004219a3 in common_adios_write_byid ()
#7  0x000000000041e482 in adios_write_byid ()
#8  0x0000000000417fe8 in main (argc=2, argv=0x7fffffffc878) at zfpExample.cpp:57

Example Code:

Assuming Adios build with MPI, but not using parallel write, run without MPI or with mpirun -n 1.

#include <iostream>

#include <string>
#include <cstring>
#include <sstream>
#include <vector>
#include <random>

#include <mpi.h>
#include <adios.h>

inline void exitOnError(const char* msg, int err) {
    if(err)
    {
        std::cerr << "[EE]" << msg << "\tAdios error code: " << err << '\n';
        exit(1);
    }
}

const char* TRANSFORM = "zfp:accuracy=0.0001";
// const char* TRANSFORM = "identity";

int main(int argc, char* argv[])
{
    std::vector<float> data;
    if(argc != 2)
        data.resize(10000000, 0.f);
    else
        data.resize(atoi(argv[1]), 0.f);

    MPI_Init(0,0);
    int rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    adios_init_noxml(MPI_COMM_WORLD);

    //adios_set_max_buffer_size(100); // no effect

    int64_t grpid;
    exitOnError("Failed to declare group."
            , adios_declare_group(&grpid, "data", "idx", adios_stat_no));
    exitOnError("Failed to select method MPI."
            , adios_select_method(grpid, "MPI", "", ""));

    int64_t adiosHandle;
    exitOnError("Failed to open file."
            , adios_open(&adiosHandle, "data", "test.bp", "w", MPI_COMM_WORLD));

    std::ostringstream size;
    size << data.size();
    for(int a = 0; a < 100; ++a)
    {
        std::ostringstream oname;
        oname << "data/" << a;
        std::cerr << oname.str() << ' ' << size.str() << '\n';
        auto var = adios_define_var(grpid, oname.str().c_str(), "", adios_real, size.str().c_str(), size.str().c_str(),"");
        exitOnError("Failed to set transform", adios_set_transform(var, TRANSFORM));
        exitOnError("Failed to write", adios_write_byid(adiosHandle, var, data.data()));
    }

    exitOnError("Failed to close", adios_close(adiosHandle));

    adios_finalize(rank);
    MPI_Finalize();
}
pnorbert commented 6 years ago

Hi, Thanks for the report, indeed there is a bug with how the transformations handle buffering. We could not figure it out yet how to fix it, I just want to let you know we are looking at this issue.

On other note, however, your example runs out of memory at some point anyway. Even without transformations. ADIOS buffers all the writes in to one buffer which is flushed in adios_close(). The only way to avoid running out of memory is to adios_close at some regularity and re-open with "u" (update mode), so all writes will be in one timestep.

Or another way is to use the POSIX method and set a buffer max size. The POSIX transport method can handle writing more data than the buffer allows for, but not the other transports.

Thanks

On Tue, May 29, 2018 at 6:15 AM, jkelling notifications@github.com wrote:

I encountered a problem when attempting to use the zfp transform in Adios 1.13.1 . When trying to write larger amounts of data an error is printed and the program crashes with a SIGSEGV immediately afterwards.

Below you find a minimal example, which works if the "identity" transform is used instead of zfp. Expected behavior

The program runs to completion and writes to test.bp. Encountered behavior

Depending on the total amount of data written and the size of the written variables the following error message is printed:

Cannot allocate memory in buffer_write. Requested: 36783836, Maximum: 36777876 ERROR: Cannot allocate shared buffer of 21875024 bytes for ZFP transform for variable data/95

At which point it fails depends on the size of the variables but there is no monotonous relation:

$ zfpExample 5000000 [...] ERROR: Cannot allocate shared buffer of 21875024 bytes for ZFP transform for variable data/95

$ zfpExample 4000000 [...] ERROR: Cannot allocate shared buffer of 17500024 bytes for ZFP transform for variable data/0

However, for smaller variables it is more likely, that more variables can be written before crashing, or the program might even complete, for example zfpExample 400000.

After this message the program segfaults, with the following beacktrace:

0 __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:37

1 0x0000000000483230 in adios_transform_zfp_apply ()

2 0x0000000000481a06 in adios_transform_apply ()

3 0x0000000000480853 in adios_transform_variable_data ()

4 0x0000000000420c53 in common_adios_write_transform_helper ()

5 0x000000000042121c in common_adios_write ()

6 0x00000000004219a3 in common_adios_write_byid ()

7 0x000000000041e482 in adios_write_byid ()

8 0x0000000000417fe8 in main (argc=2, argv=0x7fffffffc878) at zfpExample.cpp:57

Example Code:

Assuming Adios build with MPI, but not using parallel write, run without MPI or with mpirun -n 1.

include

include

include

include

include

include

include

include

inline void exitOnError(const char msg, int err) { if(err) { std::cerr << "[EE]" << msg << "\tAdios error code: " << err << '\n'; exit(1); } } const char TRANSFORM = "zfp:accuracy=0.0001";// const char TRANSFORM = "identity"; int main(int argc, char argv[]) { std::vector data; if(argc != 2) data.resize(10000000, 0.f); else data.resize(atoi(argv[1]), 0.f);

MPI_Init(0,0); int rank; MPI_Comm_rank(MPI_COMM_WORLD, &rank); adios_init_noxml(MPI_COMM_WORLD);

//adios_set_max_buffer_size(100); // no effect

int64_t grpid; exitOnError("Failed to declare group." , adios_declare_group(&grpid, "data", "idx", adios_stat_no)); exitOnError("Failed to select method MPI." , adios_select_method(grpid, "MPI", "", ""));

int64_t adiosHandle; exitOnError("Failed to open file." , adios_open(&adiosHandle, "data", "test.bp", "w", MPI_COMM_WORLD));

std::ostringstream size; size << data.size(); for(int a = 0; a < 100; ++a) { std::ostringstream oname; oname << "data/" << a; std::cerr << oname.str() << ' ' << size.str() << '\n'; auto var = adios_define_var(grpid, oname.str().c_str(), "", adios_real, size.str().c_str(), size.str().c_str(),""); exitOnError("Failed to set transform", adios_set_transform(var, TRANSFORM)); exitOnError("Failed to write", adios_write_byid(adiosHandle, var, data.data())); }

exitOnError("Failed to close", adios_close(adiosHandle));

adios_finalize(rank); MPI_Finalize(); }

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ornladios/ADIOS/issues/180, or mute the thread https://github.com/notifications/unsubscribe-auth/ADGMLZTSH-eLBHdTbdBoAlbVm7J4PdPXks5t3R_IgaJpZM4URNga .