Open 55-AA opened 4 years ago
Hi,
I know that for small payloads the transfer overhead is very large and the performance really low, but I am currently not aware of any faster way of transferring data from PS to PL than AXI DMA (and use a standard interface like the kernel crypto API and the XIlinx AXI DMA soft controller). Any suggestion is appreciated :)
Also, I think the overhead mostly comes from the software side (linux kernel crypto API + AXI dma controller driver + interrupts from PL to PS + linux scheduling non-determinism). I used to have a HDL version clocked at 150 MHz and there was no performance improvement over a 100MHz clock design, so the bottleneck seems not to be the HDL engine.
Hi, I've done a lot of experiments recently, and found that triggering a DMA transfer will spend a larger of cycles. So, I think if hardware can process mulit-packets at one DMA transfer, efficiency should be greatly improved. For this purpose, the cmd DWORD should contain the packet length at high-2byte, so that it can set a soft-tlast for controller, then continue next packet. In addition, in the linux kernel module, a queue is required only, so that cater to linux crypt-engine frame.The SG list can be appended easily for mulit-packets.
Hi, I test the hardware algorithm in 4.14.0-xilinx, and compare with software algorithm. By the following result, I thik that AXI communication consume too much cycles. Maybe reduce them, especially for small packet less than 512 bytes.
"ecb(aes)" : software algorithm; "ecb(AES)" : hardware algorithm; The test module was built form 'xilinx-linux/drivers/crypto/tcrypt.c'.