ornladios / ADIOS

The old ADIOS 1.x code repository. Look for ADIOS2 for new repo
https://csmd.ornl.gov/adios
Other
54 stars 41 forks source link

buffer increase for buffer_write() causes POSIX overflow #90

Closed seanzig closed 7 years ago

seanzig commented 8 years ago

I think commit 263e7b0 (part of the changes for v1.6.0) breaks adios_posix_close() for a moderate MPI rank count. The commit changes the realloc size in buffer_write() from 1000 to 1000000 bytes. However, the commit also updates adios_mpi_close() and adios_mpi_lustre_close() to use the buffer_offset instead of the buffer_size when doing the Gather/Gatherv of those buffers to rank 0. That change was not made for adios_posix_close(), meaning that rank 0 attempts to Gatherv ~1MB from every rank, causing an overflow when there are more than ~2K ranks. Even if it didn't crash it, that's a heck of a lot of data to Gatherv at those core counts. Is there any reason not to change adios_posix.c line 1022 (current master branch) from: int i_buffer_size = (int) buffer_size; to int i_buffer_size = (int) buffer_offset;

I haven't checked to see why MPI-Aggregate or some of the other transport layers don't break despite only MPI and MPI_Lustre being appropriately adapted.

seanzig commented 8 years ago

The fix appears to work, but one must also change adios_posix.c line 1027 to: MPI_Gatherv (buffer, i_buffer_size, MPI_BYTE

pnorbert commented 7 years ago

Thanks for the report. Fixed in commit 6b0d99b