open-mpi / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
2.16k stars 859 forks source link

romio ADIOI_GEN_WriteStrided with mixed contig/non-contig types #4144

Open markalle opened 7 years ago

markalle commented 7 years ago

I'm able to hit the below bug with master, if I run on a filesystem that takes the ADIOI_GEN_WriteStrided path for MPI_File_write(). (I was using gpfs, and offhand the users of that path appear to be gpfs, hfs, ntfs, panfs, pfs, sfs, ufs, xfs.)

Testcase: https://gist.github.com/markalle/d7da240c19e57f095c5d1b13240dae24 % mpicc -o x romio_write_timing.c % mpirun -mca io romio314 -np 4 ./x

The test has one of the ranks using a small contiguous datatype, eg

// for R0 : [ . . . x x . . . ]

while the other ranks use non-contiguous types that surround the above, eg

// for R1 : [ . . x . . x . . ]
// for R2 : [ . x . . . . x . ]
// for R3 : [ x . . . . . . x ]

The ADIOI_Gen_writestrided() in ad_write_str.c has the contiguous rank fall into this code:

            ADIO_WriteContig(fd, buf, count, datatype, ADIO_EXPLICIT_OFFSET,
                             offset, status, error_code);

around line 294, at which point it's not holding any ADIO_WRITE_LOCK.

While the non-contiguous ranks end up around line 356 in code that goes

/* contiguous in memory, noncontiguous in file. should be the most
   common case. */
...
                    ADIOI_BUFFERED_WRITE
(that macro boils down to ADIOI_WRITE_LOCK and read of a contiguous region like "x . . . . x")
...
        /* write the buffer out finally */
..
            ADIO_WriteContig(...);
            if (!(fd->atomicity))
                ADIOI_UNLOCK(fd, writebuf_off, SEEK_SET, writebuf_len);

If all the ranks had non-contiguous data and took the { lock, read, write, unlock } path it would work, but having some ranks do a direct contiguous write with no lock creates a race condition.

I prototyped a fix that added ADIOI_WRITE_LOCK/UNLOCK around the code the contiguous rank uses and that worked. But I'd like thoughts from someone with romio knowledge about whether there's a better way. I assume my fix would have performance implications.

roblatham00 commented 7 years ago

Ugh, wrapping LOCK/UNLOCK around contiguous writes gets you correct behavior but at significant performance cost. Factor of 5x on our GPFS systems, easy.

Much prefer @ggouaillardet 's suggestion to consult the file type at set_view time and then decide what path to take based on that.

I'm surprised ROMIO is taking the contig path here... the "is_contiguous" check is not much more complicated than "is this a built-in type?". I'll have to investigate a bit more.