littledan / linux-aio

How to use the Linux AIO feature
324 stars 42 forks source link

Odd return value from aio #1

Closed jml9904 closed 6 years ago

jml9904 commented 7 years ago

Hello...

Have a weird situation here. I'm using spdk to present a few devices to a host. Of those devices, all but the AIO device are working beautifully. The AIO device is choking on read I/Os, and drilling into bdev_aio.c reveals that the event returned by io_getevents() seems to have a status code of 0xEA. I cannot seem to find documentation on this error anywhere, so I added a little logging to reveal what's being returned:

SPDK_ERRLOG("-----[ xxx ]----- %s, (0x%02X), entry %d of %d, request len %lu, offset %llu (des %d, opcode %d)\n", strerror((int8_t)events[i].res & 0xFF), (int8_t)events[i].res & 0xFF, i, nr, aio_task->len, aio_task->iocb.u.v.offset, aio_task->iocb.aio_fildes, aio_task->iocb.aio_lio_opcode);

and something to dump the data return. Here's the typical response:

bdev_aio.c: 239:bdev_aio_poll: ERROR: -----[ xxx ]----- Unknown error 234, (0xEA), entry 0 of 1, request len 4096, offset 0 (des 42, opcode 7) bdev_aio.c: 269:bdev_aio_poll: ERROR: [00] 30 23 e1 1c d2 7f 00 00 : 0....... bdev_aio.c: 269:bdev_aio_poll: ERROR: [08] 00 00 00 00 00 00 00 00 : ........ bdev_aio.c: 269:bdev_aio_poll: ERROR: [10] 07 00 00 00 2a 00 00 00 : ........ bdev_aio.c: 269:bdev_aio_poll: ERROR: [18] 98 22 e1 1c d2 7f 00 00 : ........ bdev_aio.c: 269:bdev_aio_poll: ERROR: [20] 01 00 00 00 00 00 00 00 : ........ bdev_aio.c: 269:bdev_aio_poll: ERROR: [28] 00 00 00 00 00 00 00 00 : ........ bdev_aio.c: 269:bdev_aio_poll: ERROR: [30] 00 00 00 00 00 00 00 00 : ........ bdev_aio.c: 269:bdev_aio_poll: ERROR: [38] 00 00 00 00 00 00 00 00 : ........

Any idea what could consistently, but intermittently, return a 0xEA value? It turns out that the writes (and possibly the reads) look good at the disk itself, btrace looks fine, but things get weird when the data gets up into AIO.

Thanks!!!!

littledan commented 7 years ago

IIRC these error codes tend to come from the underlying file system or device driver, rather than the AIO infrastructure itself. Have you tried grepping your device driver source for this error code? Anyway, I'm a bit removed from this area these days; I'll ask for help from others.

jml9904 commented 7 years ago

Thanks for responding!

blktrace on the underlying device shows no errors. And oddly, writes tend to succeed (as observed by od) - it's reads that are choking. Plus, this seems to be somewhat intermittent but not hardware related; I can drive I/O from the local host via /dev/sdb all day long. Nothing's logged in syslog, either, and I see no evidence of complaints from the device driver.

It's really odd. I'm running out of ideas... if there's someone I should contact directly or another venue I should use, please let me know...

sitsofe commented 6 years ago

@jml9904 - it's a long shot but you might have some joy over on the Linux AIO mailing list: http://www.kvack.org/~bcrl/aio/ . The Seastar folks (https://groups.google.com/forum/#!forum/seastar-dev ) might also know as they are heavy AIO user but you're not using their product so help there might be even more limited.

jml9904 commented 6 years ago

Hey, I'll take a shot at it. Right now that project's checkpointed but I really appreciate the pointer -- maybe they'll have some wisdom!

lsjk commented 5 years ago

Hey, I'll take a shot at it. Right now that project's checkpointed but I really appreciate the pointer -- maybe they'll have some wisdom!

Hello, I am running spdk and also seeing this aio error return value. Any idea what happened ? Thanks.,