intel / qatlib

Other
91 stars 34 forks source link

cpaDcCompressData failing with CPA_DC_EP_HARDWARE_ERR #31

Closed iomartin closed 1 year ago

iomartin commented 1 year ago

Hello,

I'm working on a project that uses SPDK and qatlib with stateless compression. I started by copying the init and compression functions from the stateless_sample and it worked.

But I need to make some changes to better integrate with my application. One of them is to change how the source/dest buffers (pFlatBuffer->pData) are allocated.

When I stop using qaeMemAllocNUMA() and start using another allocation function (spdk_dma_malloc()), my compression requests (cpaDcCompressData()) start to fail with CPA_DC_EP_HARDWARE_ERR. I'd appreciate some help figuring out what the problem is.

Some other information that might be relevant:

iomartin commented 1 year ago

I brought this issue up with the SPDK community. The conclusion was the IOMMU is the issue, the SPDK functions will map the IOMMU for the devices they control, but since the QAT isn't one of them, this causes a translation problem.

I see that qaeMemAllocNUMA maps the IOMMU, so it seems like we would need a way to do the same with externally allocated memory. Some way of calling something like dma_map_slab().

I'm digging through the code to see if I can find a way of doing something like this, but if there is an easier way, please let me know

iomartin commented 1 year ago

I couldn't find any functions in the API to do the mapping, is there one that I missed?

dma_map_slab() is so simple that I can just write my own, but for that I'd need the vfio_container_fd. I could maybe follow to see how it is opened and replicate that, but it seems there is a better solution.

get_vfio_fd() already exists and does just what the name implies, but it is unfortunately not being exported.

Would it be possible to export that function? It seems like this was originally the intent, as this functions is not used by anyone else. Although probably might make more sense to just export dma_map_slab() (or some wrapper around it) directly.

gcabiddu commented 1 year ago

Hi @iomartin - what you reported is correct.

QATlib uses vfio to provide DMA isolation between processes. When running your application you are getting an error since the device is not able to access the buffer provided (the IOMMU is blocking the write).

I guess you are compiling qatlib from sources. If yes, can you try to expose get_vfio_fd() (or dma_map_slab()) and see if you can get it to work with your application?

Here is documentation that explains how to use the VFIO APIs.

I'm going to check internally if we can expose one of those functions in a future release of QATlib.

iomartin commented 1 year ago

Thanks @gcabiddu!

I exported dma_map_slab() and then called it for every allocation I did with SPDK. And it worked! For now I'll use my fork of qatlib but would appreciate it if this can get upstream.

For anyone reading this in the future: One issue I initially had was that I was allocating many small buffers, and when I tried to map those, it would fail. But of course the mappings need to have a size that is a multiple of the page size (this is not a qatlib issue). As a workaround, I always passed 2MB to dma_map_slab() (I'm using hugepages). Naturally this is the wrong thing to do in any serious deployment but it was sufficient for a POC.

gcabiddu commented 1 year ago

Feature request replicated internally - closing ticket.