In UMD, "sysmem" refers to 1G hugepages used as shared buffers between device and host. The APIs for system include tt_SiliconDevice constructor (specifies number of "channels") and accessor methods (read/write/get pointer to buffer) that are channel based.
While preserving the API, write an alternate implementation that does not huge hugepages, but relies on the KMD ability to map physically noncontiguous buffers to a contiguous IO virtual address space seen by the device.
The challenges at the system level:
WH hardware is limited to accessing an address space of 0x0-0xffff'ffff without iATU programming
KMD IOVA allocation does not seem to support complete coverage of this address space
With iATU we can work around this. However, iATU is limited to 16 regions with a constraint that the IOVA must be naturally aligned with the region size.
Unlike VFIO, KMD API provides no userspace control over the IOVA - so some over-allocation may be necessary to achieve alignment
KMD lacks an API to unmap
Challenge at the user driver level:
Maintain the illusion of one to four channels, even if there are up to 16 sub allocations
At runtime, determine whether to do sysmem via IOMMU or sysmem via hugepage
In UMD, "sysmem" refers to 1G hugepages used as shared buffers between device and host. The APIs for system include tt_SiliconDevice constructor (specifies number of "channels") and accessor methods (read/write/get pointer to buffer) that are channel based.
While preserving the API, write an alternate implementation that does not huge hugepages, but relies on the KMD ability to map physically noncontiguous buffers to a contiguous IO virtual address space seen by the device.
The challenges at the system level:
Challenge at the user driver level: