As experimentation showed, the issue with NVMe is not faulty handling of block I/0 events but merely not enough block_bio_* events happening for the expected work load, especially for reads but partially also for writes.
As this presentation puts it: there are now "Dedicated read queues" for NVMe devices, which sounds to me like NVMes are bypassing the block layer. (which is sensible, because the block layer is big, slow and made with spinning platters in mind)
There are nvme_setup_cmd and nvme_complete_rq tracepoints specifically for NVMe, but one might want to look for a unified solution.
As experimentation showed, the issue with NVMe is not faulty handling of block I/0 events but merely not enough
block_bio_*
events happening for the expected work load, especially for reads but partially also for writes.As this presentation puts it: there are now "Dedicated read queues" for NVMe devices, which sounds to me like NVMes are bypassing the block layer. (which is sensible, because the block layer is big, slow and made with spinning platters in mind)
There are
nvme_setup_cmd
andnvme_complete_rq
tracepoints specifically for NVMe, but one might want to look for a unified solution.