Closed mattkretz closed 7 years ago
On the 3rd point: After reviewing the AMD Zen architecture a bit, it seems "most efficient data parallel execution for the element type T" means 16-Byte vectors for Zen, even though the ISA supports AVX2. Ultimately, this is up to the implementation, but the intent of the wording is not to require the widest usable vector register size.
I added several margin notes with alternative text avoiding "target architecture" and "target system" in normative wording. Better?
sse
,avx
, etc. Maybe that suggestion goes a little too far. Maybe no example names?int
on SandyBridge where ymm registers can storeint
vectors, but operations are only available on xmm registers.)