ornladios / ADIOS

The old ADIOS 1.x code repository. Look for ADIOS2 for new repo
https://csmd.ornl.gov/adios
Other
54 stars 40 forks source link

adios_selection_writeblock crash in MPI/BP read #154

Open burlen opened 6 years ago

burlen commented 6 years ago

In my code I am using adios_selection_writeblock(rank) in an M-M scenario. This runs correctly with the FLEXPATH method, but crashes in MPI/BP method in the read side. When using BP, the first timestep is processed correctly but during the second time step the second rank (rank 1) always crashes.

here is the bt

Program received signal SIGSEGV, Segmentation fault.
0x00000000004303af in get_req_datasize (fp=0xa3e690, r=0xa76f60, v=0xa67cb0) at ../../ADIOS/src/read/read_bp.c:3158
3158                    datasize *= v->characteristics[pgidx].dims.dims[i * 3];
(gdb) where
#0  0x00000000004303af in get_req_datasize (fp=0xa3e690, r=0xa76f60, v=0xa67cb0) at ../../ADIOS/src/read/read_bp.c:3158
#1  0x00000000004306a0 in adios_read_bp_schedule_read_byid (fp=0xa3e690, sel=0xa76f30, varid=34, from_steps=0, nsteps=1, data=0xa793d0)
    at ../../ADIOS/src/read/read_bp.c:3254
#2  0x0000000000420ac5 in common_read_schedule_read_byid (fp=0xa3e690, sel=0xa76f30, varid=34, from_steps=0, nsteps=1, param=0x0, data=0xa793d0)
    at ../../ADIOS/src/core/common_read.c:3698
#3  0x00000000004206fd in common_read_schedule_read (fp=0xa3e690, sel=0xa76f30, varname=0xa76f00 "dataset_2/array_0/data", from_steps=0, nsteps=1, 
    param=0x0, data=0xa793d0) at ../../ADIOS/src/core/common_read.c:3621
#4  0x00000000004131a2 in adios_schedule_read (fp=0xa3e690, sel=0xa76f30, varname=0xa76f00 "dataset_2/array_0/data", from_steps=0, nsteps=1, 
    data=0xa793d0) at ../../ADIOS/src/core/adios_read.c:130
#5  0x0000000000411074 in read_array<char> (dataset_id=2, array_id=0, fp=0xa3e690) at get.cpp:85
#6  0x000000000040cbb4 in main (argc=3, argv=0x7fffffffd708) at get.cpp:149

valgrind reports many invalid read error

code to reproduce can be found here: https://github.com/burlen/test_dataspaces/archive/master.zip

unzip
make issue_146
mpiexec -np 2 ./put test.bp MPI 10 2 3
mpiexec -np 2 ./get test.bp BP
pnorbert commented 6 years ago

In this example, the second process is selecting the second writeblock of each array. However, there is only one block per variable per step. It should result in error, not in segfault. Nevertheless, your example is incorrectly written. I am not sure if running this with flexpath indicates that this example works there - it simply may ignore the block id and return the one and only block's data. To fix the example, you need to use blockid 0 in all selections (get.cpp line 84) ADIOS_SELECTION *sel = adios_selection_writeblock(0);

burlen commented 6 years ago

Yes, I noticed that using write block 0 works with MPI/BP. However, when when using FLEXPATH method, the program spits out an error message

ERROR: Flexpath error:  Variable "dataset_0/array_0/number_of_elements" not found.

and deadlocks. When I use the rank to select the write block then it works and produces the correct output.

There's definitely an inconsistency between FLEXPATH and MPI/BP methods in how write block selections work. It sounds like FLEXPATH is implemented differently than expected.

I also tried to adios_inq_var/adios_inq_var_blockinfo to probe what was there. With MPI/BP this worked but with FLEXPATH blockinfo field was always NULL.