riscvarchive / riscv-v-spec

Working draft of the proposed RISC-V V vector extension
https://jira.riscv.org/browse/RVG-122
Creative Commons Attribution 4.0 International
968 stars 273 forks source link

Need a simple way to tell if Tail/Mask Agnostic is implemented #710

Open tony-cole opened 3 years ago

tony-cole commented 3 years ago

The programmer needs a simple way to detect if the underlying hardware produces a different result if an Agnostic bit is set (because of VPU implementation performance optimisations).

I suggest: Set Agnostic bit(s) and then read them back to detect if Agnostic performance hardware is present, as follows:

And/or have information registers showing useful information such as this and other things, e.g. Vector Revision, Profile, VLMAX, etc. This way the programmer has the option for run-time switches as well as compile time switches.

kasanovic commented 3 years ago

Software cannot rely on deterministic agnostic behavior, even on same core. Big-little thread migration will also change behavior dynamically. In general, high-performance code will need to do microarchitecture-specific (auto)tuning, and ta/ma behavior is only small component of a much larger problem (e.g., how many ALU functional units versus memory functional units).

The other things mentioned belong in discovery mechanism appropriate for platform (e.g., probably API calls for Linux process).

tony-cole commented 3 years ago

I'm not suggesting relying on deterministic agnostic behaviour, but rather knowing at run-time if the underlying VPU implementation supports Agnostic behaviour or not. Knowing this may be useful for specific run-time optimisations if the behaviour is always, say, undisturbed. Not everything runs Linux and not everything is an Application level processor. The difficulty is to create one specification to cover most implementations and is useful from a programmer’s perspective.

In the past programmers had to do all sorts of tricks to discover things about the underlying hardware, for instance big/little endian detection. I expect programmers will have to do this sort of thing to discover the Tail policy on some RVV systems.

The above suggestion for vtype register vta/vma bit behaviour is trivial, but could be useful for programmers in the future. Why not do this?

kasanovic commented 3 years ago

This is a subset of more general microarchitecture tuning that can have much larger effects on performance (e.g., balance of ALU to memory pipes). A platform's discovery mechanism can allow the implementation microarchitecture to be determined for the purposes of run-time binding of appropriate optimized library code. But this can be supported without requiring additional CSRs.

We are in process of defining a general discovery mechanism as opposed to building in extension-specific discovery mechanisms.