Open marcfedorow opened 3 years ago
This is why appendix A shouldn't ever appear in the spec.
Non accumulating instructions can be made with lower latency, but sometimes it is cheaper to just reuse datapath and accumulate with implicit zero.
With ternary encoding, that could be a single instruction.
I see no actual reason to implement PBSAD (Latency 2) while there is PBSADA (Latency 2). PBSAD may be easily replaced by moving zero to rd and executing PBSADA. Worst-case scenario it takes 3 cycles instead of 2 -- but IMO use-case of PBSAD[A] does not make a significant difference. I think that either PBSAD's latency should be 1 or PBSAD insn should not be implemented at all. If there are any use-cases that proof me wrong, they are very welcome.