hpc / ior

IOR and mdtest
Other
377 stars 166 forks source link

Incompatible branching and stonewalling #163

Closed johnbent closed 3 years ago

johnbent commented 5 years ago

There is the following code in mdtest:

if (((stone_wall_timer_seconds > 0) && (branch_factor > 1)) || ! barriers) {
    FAIL( "Error, stone wall timer does only work with a branch factor <= 1 and with barriers\n");
}

This code prohibits using branching with stonewall unless barriers are set. By the way, it also appears to fail even when barriers are set:

/h/j/t/i/src$ \rm -rf out; mpirun -np 2 ./mdtest -W 3 -b 2 -n 1000 -u -L -N 0 -B 1
-- started at 07/27/2019 16:02:29 --

mdtest-3.3.0+dev was launched with 2 total task(s) on 1 node(s)
Command line used: ./mdtest "-W" "3" "-b" "2" "-n" "1000" "-u" "-L" "-N" "0" "-B" "1"
Path: /home/johnbent/temp/ior-1/src
FS: 235.5 GiB   Used FS: 87.6%   Inodes: 0.0 Mi   Used Inodes: -100000.1%

Nodemap: 11
2 tasks, 2000 files/directories
07/27/2019 16:02:32: Process 0: FAILED in mdtest_stat, unable to stat file ./out/#test-dir.0-0/mdtest_tree.0.0/file.mdtest.0.1: No such file or directory
07/27/2019 16:02:32: Process 1: FAILED in mdtest_stat, unable to stat file ./out/#test-dir.0-0/mdtest_tree.1.0/file.mdtest.1.1: No such file or directory
application called MPI_Abort(comm=0x84000004, 1) - process 0
application called MPI_Abort(comm=0x84000002, 1) - process 1

I was thinking about trying to debug this one. To help me out, I think it was @JulianKunkel who added this check. Julian, do you remember what the issue is here? Any guidance to help with debugging this?

JulianKunkel commented 5 years ago

IMHO that was the behavior by it's default... I dislike it, but it is there. Use -L to create files only in leaf directories for example.

Am Sa., 27. Juli 2019 um 23:13 Uhr schrieb John Bent < notifications@github.com>:

By the way, are we sure that branching even works? I just ran this command: mpirun -np 4 ./mdtest -W 3 -b 400 -n 3 -N 0 -C -u And here is what the tree looks like: out └── #test-dir.0-0 ├── mdtest_tree.0.0 │ ├── dir.mdtest.0.0 │ ├── dir.mdtest.0.1 │ ├── dir.mdtest.0.2 │ ├── file.mdtest.0.0 │ ├── file.mdtest.0.1 │ └── file.mdtest.0.2 ├── mdtest_tree.1.0 │ ├── dir.mdtest.1.0 │ ├── dir.mdtest.1.1 │ ├── dir.mdtest.1.2 │ ├── file.mdtest.1.0 │ ├── file.mdtest.1.1 │ └── file.mdtest.1.2 ├── mdtest_tree.2.0 │ ├── dir.mdtest.2.0 │ ├── dir.mdtest.2.1 │ ├── dir.mdtest.2.2 │ ├── file.mdtest.2.0 │ ├── file.mdtest.2.1 │ └── file.mdtest.2.2 └── mdtest_tree.3.0 ├── dir.mdtest.3.0 ├── dir.mdtest.3.1 ├── dir.mdtest.3.2 ├── file.mdtest.3.0 ├── file.mdtest.3.1 └── file.mdtest.3.2

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hpc/ior/issues/163?email_source=notifications&email_token=ABGW5STZ4QFN5TA5JXH345TQBTCAXA5CNFSM4IHK2J42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD26TQDI#issuecomment-515717133, or mute the thread https://github.com/notifications/unsubscribe-auth/ABGW5SWDMFUJ3NCNQBJLXL3QBTCAXANCNFSM4IHK2J4Q .

-- Dr. Julian Kunkel Lecturer, Department of Computer Science +44 (0) 118 378 8218 http://www.cs.reading.ac.uk/ https://hps.vi4io.org/ PGP Fingerprint: 1468 1A86 A908 D77E B40F 45D6 2B15 73A5 9D39 A28E

reflectored commented 4 years ago

@johnbent it might be related to my ticket #241

JulianKunkel commented 3 years ago

If there is an issue closing this, feel free to reopen. At the moment, it does not seem to be relevant for the users.