plfs / plfs-core

LANL no longer develops PLFS. Feel free to fork and develop as you wish.
41 stars 36 forks source link

problem with mpi set view #83

Open brettkettering opened 11 years ago

brettkettering commented 11 years ago

I was testing PLFS with MPI-TILE-IO (a benchmark) and found the data droppings were bigger than what I expected. To understand the problem, I took a tiny program I wrote before and ran it on POSIX, MPI-IO, and FUSE. The data contents of MPI-IO are different from the others.

What MPI-TILE-IO and my tiny program do is: set a file view of subarray, each process writes to its own subarray. The problem may come from OVERLAP. I looked at mlog and found the processes were doing overlapping writes.

An illustration of the tiny program: 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 2 2 2 2 2 3 3 3 3 3 2 2 2 2 2 3 3 3 3 3 2 2 2 2 2 3 3 3 3 3 2 2 2 2 2 3 3 3 3 3

Each process writes a subarray.

Below is some log. The file contents are marked red. I was using the latest PLFS in the svn trunk. But the MPI patch was old (may be from the start of June).

manio@ubuntu:~/tmp/learnmpiio$ mpicc 10_file_view.c -o fview.x 10_file_view.c: In function ‘main’: 10_file_view.c:53:5: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘MPI_Offset’ [-Wformat] manio@ubuntu:~/tmp/learnmpiio$ manio@ubuntu:~/tmp/learnmpiio$ manio@ubuntu:~/tmp/learnmpiio$ mpirun -np 4 ./fview.x /mnt/plfs28/plfs008fuse starts = (0, 5). starts = (5, 0). starts = (0, 0). starts = (5, 5). Write at 0 Write at 0 Write at 0 Write at 0 manio@ubuntu:~/tmp/learnmpiio$ mpirun -np 4 ./fview.x plfs:/mnt/plfs28/plfs008 starts = (0, 5). starts = (5, 5). starts = (0, 0). starts = (5, 0). Write at 0 Write at 0 Write at 0 Write at 0 manio@ubuntu:~/tmp/learnmpiio$ mpirun -np 4 ./fview.x posix008 starts = (5, 5). starts = (5, 0). starts = (0, 5). starts = (0, 0). Write at 0 Write at 0 Write at 0 Write at 0 manio@ubuntu:~/tmp/learnmpiio$ cat /mnt/plfs28/plfs008fuse 0000011111000001111100000111110000011111000001111122222333332222233333222223333322222333332222233333manio@ubuntu:~/tmp/learnmpiio$ manio@ubuntu:~/tmp/learnmpiio$ cat /mnt/plfs28/plfs008 0000011111zzzzz11111zzzzz11111zzzzz11111zzzzz111112222233333zzzzz33333zzzzz33333zzzzz33333zzzzz33333manio@ubuntu:~/tmp/learnmpiio$ manio@ubuntu:~/tmp/learnmpiio$ cat posix008 0000011111000001111100000111110000011111000001111122222333332222233333222223333322222333332222233333manio@ubuntu:~/tmp/learnmpiio$

manio@ubuntu:~/workdir/plfs/codeworkdir/plfs/ad-patches$ ls /mnt/.plfs_store/plfs008fuse/hostdir.0/ -l total 20 -rwxrwxrwx 1 manio manio 45 Aug 7 13:35 dropping.data.1344371727.768217.ubuntu.14086 -rwxrwxrwx 1 manio manio 45 Aug 7 13:35 dropping.data.1344371727.768217.ubuntu.14087 -rwxrwxrwx 1 manio manio 45 Aug 7 13:35 dropping.data.1344371727.768217.ubuntu.14088 -rwxrwxrwx 1 manio manio 45 Aug 7 13:35 dropping.data.1344371727.768217.ubuntu.14089 -rwxrwxrwx 1 manio manio 160 Aug 7 13:35 dropping.index.1344371727.768217.ubuntu.14086 manio@ubuntu:~/workdir/plfs/codeworkdir/plfs/ad-patches$ ls /mnt/.plfs_store/plfs008/hostdir.0/ -l total 32 -rwxrwxrwx 1 manio manio 45 Aug 7 13:35 dropping.data.1344371738.748436.ubuntu.0 -rwxrwxrwx 1 manio manio 45 Aug 7 13:35 dropping.data.1344371738.751331.ubuntu.3 -rwxrwxrwx 1 manio manio 45 Aug 7 13:35 dropping.data.1344371738.752370.ubuntu.1 -rwxrwxrwx 1 manio manio 45 Aug 7 13:35 dropping.data.1344371738.754699.ubuntu.2 -rwxrwxrwx 1 manio manio 40 Aug 7 13:35 dropping.index.1344371738.748436.ubuntu.0 -rwxrwxrwx 1 manio manio 40 Aug 7 13:35 dropping.index.1344371738.751331.ubuntu.3 -rwxrwxrwx 1 manio manio 40 Aug 7 13:35 dropping.index.1344371738.752370.ubuntu.1 -rwxrwxrwx 1 manio manio 40 Aug 7 13:35 dropping.index.1344371738.754699.ubuntu.2 manio@ubuntu:~/tmp/learnmpiio$ ls posix008 -l -rw-rw-r-- 1 manio manio 100 Aug 7 13:36 posix008

There are 45 bytes in each data dropping but the total size of the size is 100 bytes.

map file:

manio@ubuntu:~/tmp/learnmpiio$ plfs_map /mnt/plfs28/plfs008

Index of /mnt/.plfs_store//plfs008

Data Droppings

0 /mnt/.plfs_store//plfs008/hostdir.0//dropping.data.1344371738.748436.ubuntu.0

1 /mnt/.plfs_store//plfs008/hostdir.0//dropping.data.1344371738.754699.ubuntu.2

2 /mnt/.plfs_store//plfs008/hostdir.0//dropping.data.1344371738.751331.ubuntu.3

3 /mnt/.plfs_store//plfs008/hostdir.0//dropping.data.1344371738.752370.ubuntu.1

Entry Count: 6

ID Logical_offset Length Begin_timestamp End_timestamp Logical_tail ID.Chunk_offset

0 w 0 5 1344371738.8041160106658936 1344371738.8041679859161377 4 [0. 0]
3 w 5 40 1344371738.8175530433654785 1344371738.8176159858703613 44 [3. 0]
3 w 45 5 1344371738.8175530433654785 1344371738.8176159858703613 49 [3. 40]
1 w 50 5 1344371738.8013739585876465 1344371738.8014359474182129 54 [1. 0]
2 w 55 40 1344371738.8070330619812012 1344371738.8070850372314453 94 [2. 0]
2 w 95 5 1344371738.8070330619812012 1344371738.8070850372314453 99 [2. 40]

The processes actually do: rank off len 0 0 45 1 5 45 2 50 45 3 55 45

MPI tried to fill the holes between columns in array. So it wrote 45 bytes instead of 25 bytes.

Is it a bug that has been fixed or any thing wrong in my tiny program? The program source code is attached.

to compile: mpicc 10_file_view.c -o fview.x to run: mpirun -np 4 ./fview.x file-path

(from Jun He)

brettkettering commented 11 years ago

/*

include

include

include "mpi.h"

define ROW_S 10

define COL_S 10

int rank, size;

int main (argc, argv) int argc; char *argv[]; { MPI_File fh; MPI_Status status; int i,j; MPI_Datatype sb_arr;

MPI_Init (&argc, &argv);    /* starts MPI */
MPI_Comm_rank (MPI_COMM_WORLD, &rank);  /* get current process id */
MPI_Comm_size (MPI_COMM_WORLD, &size);  /* get number of processes */

/* This is a collective call, which means every process does the same thing, opens the same file */
MPI_File_open( MPI_COMM_WORLD, argv[1], MPI_MODE_CREATE | MPI_MODE_RDWR, MPI_INFO_NULL, &fh );

MPI_Barrier(MPI_COMM_WORLD);

int sizes[2] = {ROW_S,COL_S}; // big array has sizes[0] rows and size[1] columns
int subsizes[2] = {5,5};  //subarray has subsizes[0] rows and subsizes[1] columns
int starts[2] = {0,0};    //subarray starts at row starts[0] and column starts[1]
starts[0] = (rank / 2) * 5;
starts[1] = (rank % 2) * 5;
printf("starts = (%d, %d).\n", starts[0], starts[1]);

MPI_Type_create_subarray( 2, sizes, subsizes, starts, MPI_ORDER_C, MPI_CHAR, &sb_arr );
MPI_Type_commit(&sb_arr);

MPI_File_set_view(fh, 0, MPI_CHAR, sb_arr, "native", MPI_INFO_NULL);

char * p = malloc(100);
for ( i = 0 ; i < 100 ; i++ ) {
    p[i] = '0'+rank;
}

MPI_Offset off;
MPI_File_get_position( fh, &off );
printf("Write at %d\n",  off);

MPI_File_write( fh, p, 1, sb_arr, &status);

MPI_File_close(&fh);
MPI_Finalize();
return 0;

}