Closed pefar closed 1 year ago
I guess the timeout may stem from the host NVMe driver cannot find the correct CQ entry. I suggest you may check the DMA address for submitting the CQ entry by simulation or ILA.
Thanks for reply! Do you mean the DDR module caused a timeout or CQ error directly? Certainly, the DDR module does causes a timeout to I/O cmd processing. But in NVMeCHA, Admin Cmd are processed by MicroBlaze, with no accessing to I/O Controller, where we added DDR module. And the timeout occured during the Linux's initialization process, where, according to the NVMe Spec, there are no I/O cmd processed. So I don't think DDR module caused a timeout to Admin cmd's processing. However, rarely, the timeout cmd's sqid, read from dw10 in the Abort cmd, is not 0 (admin queues own id 0), which indicates it's an I/O cmd. Now we suppose that probably the kernel nvme driver triggered an accident dma transport which uses the I/O pipe. Or others.
I mean the CQ entry may be submitted to a wrong address, so the NVMe driver cannot find it. I think you would better try to print some information in the firmware and use ILA to check the waveform for debugging.
Along with the lane width modification, I adjusted the AXI Data Width from 128 to 256 in XDMA IP. And now the controller works fine (except some I/O timeout msg from kernel after I replaced the user_app with ddr UI). Thanks for your reply!
You are welcome!
Hello, firstly I wanna say NVMeCHA is so useful & valuable!
Recently, we are in a program where we're trying to add the DDR4 module with NVMeCHA to run on our Zynq UltraScale+ Card. After combined them, errors occurs during the process of Linux's start:
First, we checked the output from uart, found that the Admin Commands were aborted after
Identify Namespace
completed, and the Admin CommandAbort
processed. Then host seemed to try restarting the controller but controller was dead at last. And we see the errors above were printed to the screen.Then, we checked the kernel.log recored during system's start, and found these info:
which seems that a
timeout
caused the aborting. Probably theIdentify Namespace
command were processed with sth wrong? Maybe we have to reconfigure some features in the firmware codes whereIdentify Namespace
is deifned. Now we are reading the Linux kernel driver source, understanding the order and calls of probing the nvme device so as to get a hint.Is there anyone met the same problem? thanks for some help.