Open iaGuoZhi opened 2 years ago
这篇文章关注于嵌入式场景的FPGA,目前主流厂商的SOC中具有IOMMU,但是缺乏一个infrastructure来让application通过IOMMU使用SVM。它认为需要一个新的抽象。
该工作提供的framework包括两个部分:
In many cases accelerator traffic is more bandwidth-sensitive than latency-sensitive
In our previous works, we have explored lightweight SVM support for PMCAs based on a software-managed IOTLB considering applications based on regular [20] and irregular (pointerrich) [21] memory access patterns, and exploring PMCA-local IOTLB management [22].
At the heart of any IOMMU design sits an input/ output translation lookaside buffer (IOTLB).
Instead, a separate and empty I/O page table is generated at setup time. The first TLB miss to every page then generates a costly page fault that must be handled in software by the host by mapping the corresponding page to the I/O page table. The hardware management only helps for subsequent TLB misses on pages already mapped. Alternatively, all pages holding the shared data must be mapped at offload time, which is impracticable when operating on pointer-rich data structures. Finally, due to the decoupling of the I/O and the process’ page table, the only way to ensure that the IOMMU does not use stale page-table data at any time is to prevent the mapped pages from being moved by page pinning, which further aggravates the cost for mapping and page fault handling.
作者为什么说要在FPGA上实现一个soft IOMMU而不是用host上的IOMMU:
我感觉为IO设备pin内存可能现在对于一些设备RDMA,VFIO能够work,但是对于之后的众多异构设备来说,需要它们和CPU一起做一件事情,不可能在运行前就确定它们访问哪块内存,pin内存不是可持续的办法。实现iopf是必要的
In the case of a TLB miss or page fault, the interrupt handler inside the driver module simply triggers the execution of the worker thread in normal process context. Once this worker thread gets scheduled, it first reads the address and transaction attributes from the IOMMU hardware and pins the requested userspace page in memory using get_user_pages(). Then, it maps the pinned page to the I/O page table in case the hard-macro IOMMU is used, or performs virtual-to-physical address translation and sets up a new entry in the TLB if the soft IOMMU is used.
hard-macro 的IOMMU在非PRI的io page fault不是non recoverable的吗?怎么又能够处理了呢?
If a transaction misses in the TLB, its VA, ID and the AXI User Signals are stored inside the miss first-in, firstout buffers (FIFOs), and an interrupt is sent to the host CPU.
如果IOTLB不存在,就让host来建立映射,这跟目前PCI设备里面提出的PRI, ATS概念好像
In parallel, the IOMMU drops the transaction and signals a slave error in the AXI Read/Write Response back to the wrapper core inside the FPGA. The IOMMU does not block and can continue to handle address translations from other transactions to shared memory issued by the accelerator.
FPGA上的IOMMU在出现page fault后,会abort当前的transaction,继续执行其他transaction
Our results show that, due to limitations in the low-level drivers and kernel APIs, the performance of hardmacro IOMMUs can be dominated by handling page faults.
Main idea
FPGA 加速器使用的IOMMU页表和CPU使用的MMU页表是同一套,因此FPGA可以直接使用va来访存,而不是依赖memory copy。作者认为现在的FPGA加速器要和cpu使用同一个va进行访存,需要设计一个SVM framework.
该工作提供的framework包括两个部分:
作者认为现有的SOC中IOMMU不能够处理主动page fault也不能够让kernel来处理page fault(时间太长),需要pin memory。 于是在FPGA上实现了一个soft IOMMU(其实就是iotlb),通过iotlb绕过host IOMMU来处理page fault(iotlb miss就发缺页异常到host,host处理,当前在iotlb 上translate 的transaction被abort掉)。作者探究了几种iotlb的设计(针对不同场景,latency sensitive or bandwidth sensitive or hybrid) 并进行了测试。
作者实现的这个svm框架既能够使用FPGA上自己实现的IOMMU,也能够兼容host IOMMU。
Key insight
在FPGA上实现了IOTLB来处理io page fault。
My comments
在设备上实现IOTLB来处理io page fault,这和PRI异曲同工。