nvdla / hw

RTL, Cmodel, and testbench for NVDLA
Other
1.67k stars 561 forks source link

Why sdp has two X module? #351

Open FengJungle opened 2 years ago

FengJungle commented 2 years ago

image As the picture shows, the X1 and X2 are totally the same, why there is two X module in SDP?

FengJungle commented 2 years ago

And one more question: Compared with X1/X2, the difference in Y is the LUT. Why the SDP is X1 + X2 + Y(with LUT), but not a single X(with LUT)?
Could anyone share the design idea? Thanks a lot!

bg193 commented 2 years ago

从设计上看可能是为了支持算子融合

FengJungle commented 2 years ago

从设计上看可能是为了支持算子融合

你是说,计算图中可能有相邻的两个使用到sdp的算子可以融合到一个sdp完成吗?看nvdla的compiler源码的确有这样的操作

bg193 commented 2 years ago

从设计上看可能是为了支持算子融合

你是说,计算图中可能有相邻的两个使用到sdp的算子可以融合到一个sdp完成吗?看nvdla的compiler源码的确有这样的操作

理论上可以一个通路做bias,一个做norm

FengJungle commented 2 years ago

嗯,实际上源码里也的确有这样的融合的优化处理。不过一个x+y模块也可以完成算子融合,我只是很奇怪为什么是两个x而不是1个或者3个?文档里也没有给出说明。所以就想问问大家的看法。