hpc / ior

IOR and mdtest
Other
374 stars 165 forks source link

aio reads broken? #461

Open paf0186 opened 1 year ago

paf0186 commented 1 year ago

Hi all,

We've just been trying out the aio option in IOR, and a trivial "write 4K, read 4K back" with aio fails with corruption on the read; this is to XFS on RHEL 8 and to Lustre (also on RHEL 8). If you keep the file and read it in POSIX mode, the read succeeds, so the write is OK. (AIO reads fail even if it's just a read by itself on an existing file)

So AIO read mode seems to be totally broken, at least on RHEL 8. (This is with tip of tree as of today; acd3a154be765083d2d2543885aee983ac3ae18f / ior: improve error for existing output file (#459) )

Here's the relevant output - for aio: mpirun --allow-run-as-root -np 1 ior -C -Q 1 -g -G=-1385218502 -t 4096 -b 4096 -s 1 -k -e -w -r -R -a aio -o /tmp/iorfile [...] Results: access bw(MiB/s) IOPS Latency(s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter


write 0.761126 196.54 0.005088 4.00 4.00 0.000037 0.005088 0.000004 0.005132 0 WARNING: Incorrect data on read (1 errors found).Used Time Stamp 2909748794 (0xad6f3e3a) for Data Signature read 277.69 127100 0.000008 4.00 4.00 0.000002 0.000008 0.000004 0.000014 0

Reading the file back with POSIX: mpirun --allow-run-as-root -np 1 ior -C -Q 1 -g -G=-1385218502 -t 4096 -b 4096 -s 1 -k -e -r -R -o /tmp/iorfile [...] Results: access bw(MiB/s) IOPS Latency(s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter


read 34.57 37118 0.000027 4.00 4.00 0.000064 0.000027 0.000022 0.000113 0Summary of all tests: Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Max(OPs) Min(OPs) Mean(OPs) StdDev Mean(s) Stonewall(s)

JulianKunkel commented 1 year ago

I can reproduce this error. It appears the data is correctly written, i.e. writing with aio and then reading with posix works $ mpirun --allow-run-as-root -np 1 ior -C -Q 1 -g -G=-1385218502 -t 4096 -b 4096 -s 1 -k -e -w -a aio -o /tmp/iorfile $ mpirun --allow-run-as-root -np 1 ior -C -Q 1 -g -G=-1385218502 -t 4096 -b 4096 -s 1 -k -e -R -o /tmp/iorfile

JulianKunkel commented 1 year ago

Since I wrote this AIORI version, I realized this is not well documented: "Collect ops until granularity is reached, then submit pending IOs. Synchronize latest on close. Doesn't work with data verification and reuses the existing buffer. The implementation shows the potential AIO may have." I think this clarifies it, you cannot do the verification with this implementation.

The alternative would be to synchronize it. That would prevent showing the potential of AIO... I guess the documentation should be improved.

adilger commented 1 year ago

It seems like the right answer is to allocate new buffers for each AIO read, until AIO reads start completing, then re-use the buffers in an allocation pool. Possibly if there is a maximum AIO submission count, the number of read buffers could be preallocated based on that.

The data verification should be done after the AIO read completion, before the buffer is returned to the pool.

Alternately, just allocate any free the read buffer each time, but that may be expensive?

JulianKunkel commented 1 year ago

That would be correct but change the current behavior where the buffer is reused... I suppose we could have a new semantics to IOR in general creating a unique buffer per request. Alternatively, creating a buffer and copying the data in the AIO function but that sucks, too.