Open meton-robean opened 5 years ago
该论文比较早了。但是他说他们比较创新使用sratchpad替代vector register files来存取向量数据
1. 灵活性
该结构没有使用vector register files 来存储从memory读进来的向量数据,而是将向量数据存在sractchpad 中,论文论述使用srachpad更加灵活,可以存储不定长的向量。
2. 访存效率
The traditional vector register file requires explicit load/store operations to transfer the data from/to memory before any vector operations can be performed. These data transfers can be time consuming.
data vectors are stored in the scratchpad memory and accessed via address registers. Most vector instructions specify 3 address registers: 2 source, and 1 destination. Each register specifies the starting location of a vector in the scratchpad. The number of elements in thevector is determined by a separate dedicated vector lengthregister, VL.
When data needs to be loaded from the external DDR2 memory, the DMA engine will transfer the data into the vector scratchpad memory. This is done in parallel with vector operations, usually in a double-buffered fashion, so memory transfer latency is often completely hidden from the vector core. The programmer must explicitly ensure the DMA transfer is complete by polling or blocking before issuing any dependant vector instructions.
使用vector register files的话每次进行运算前需要显式从memory中取数放入vector register files中,这样访存耗时。使用sractchpad的话可以利用ping-pong机制实现运算同时preload数据,从sractchpad取出数据给运算单元也比较灵活,有专门的源/目标数 地址寄存器(Address Registers分别指向 向量数据在sractchpad 中的开始位置, vector length register设定要取的长度,这样就可以从sractchpad取出数据了。
地址寄存器(Address Registers)可以设定每次递增的步长
标量数据 论文中使用一个 scalar data queue来从放标量数
VEGAS: Soft Vector Processor with Scratchpad Memory (ACM SIGDA 2011 CCF B类)