riscv-non-isa / riscv-c-api-doc

Documentation of the RISC-V C API
https://jira.riscv.org/browse/RVG-4
Creative Commons Attribution 4.0 International
68 stars 38 forks source link

[FMV] Runtime Resolver Function #74

Open BeMg opened 4 months ago

BeMg commented 4 months ago

This PR proposes a runtime resolver function that retrieves the environment information. Since this resolver function is expected to be available and interchangeable for both libgcc and compiler-rt, a formal specification for the resolver function interface is necessary.


When generating the resolver function for function multiversioning, a mechanism is necessary to obtain the environment information.

To achieve this goal, several steps need to be taken:

  1. Collect the required extensions for a particular function.
  2. Transform these required extensions into a platform-dependent form.
  3. Query whether the environment fulfills these requirements during runtime.

Step 1 is handled by the compiler, while step 3 must follow the necessary steps from the platform during runtime.

This RFC aims to propose how the compiler and runtime function can tackle step 2.

Here is a example

__attribute__((target_clones("default", "arch=rv64gcv"))) int bar() {
    return 1;
}

In this example, there are two versions of function bar. One for default, another for "rv64gcv".

If the environment meets the requirements, then bar can utilize the arch=rv64gcv version. Otherwise, it will invoke the default version.

This process be controlled by the ifunc resolver function.

ptr bar.resolver() {
   if (isFulFill(...))
      return "bar.arch=rv64gcv";
   return bar.default;
}

The isFulFill should available during the program runtime.

The version arch=rv64gcv require

i, m, a, f, d, c, v, zicsr, zifencei, zve32f, zve32x, zve64d, zve64f, zve64x, zvl128b, zvl32b, zvl64b,

The problem 2 is about where to maintain the relationship between extension names and platform-dependent probe forms.

Here are three possible approach to achieve goal.

  1. Encode all required extensions into a string format, then let the platform implement its own probe approach based on the string inside the runtime function. This approach maintains the relationship between extension names and platform-dependent probe forms inside the runtime function.
ptr bar.resolver() {
   if (isFulFill("i_m_a_f_d_c_v_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b"))
      return bar.arch=rv64gcv;
   return bar.default;
}

bool isFulFill(char *ReqExts) {
    if (isLinux())
       return doLinuxRISCVExtensionProbe(ReqExts);
    if (isFreeBSD())
       return doFreeBSDRISCVExtensionProbe(ReqExts);
    // Other platform
    ....
    return false;
}
  1. Encode all required extensions into a compiler-defined key, then let the platform implement its own probe approach inside the runtime. This approach maintains the relationship between the compiler-defined key for extensions and the platform-dependent probe form inside the runtime function.
// Assume compiler define
// i -> 1
// m -> 2
...

ptr bar.resolver() {
   if (isFulFill([1, 2, 3, 8, ...], length))
      return bar.arch=rv64gcv;
   return bar.default;
}

bool isFulFill(int *ReqExts, length) {
    if (isLinux())
       return doLinuxRISCVExtensionProbe(ReqExts, length);
    if (isFreeBSD())
       return doFreeBSDRISCVExtensionProbe(ReqExts, length);
    // Other platform
    ....
    return false;
}
  1. Define a different runtime function for each platform and construct any necessary information during compilation time if necessary for the platform. This approach maintains the relationship between extension names and platform-dependent probe forms inside the compiler.
// If compiler compile for linux, then use bar.resolver.linux
ptr bar.resolver.linux() {
   if (isFulFillLinux(LinuxProbeObject))
      return bar.arch=rv64gcv;
   return bar.default;
}

ptr bar.resolver.freebsd() {
   if (isFulFillFreeBSD(FreeBSDProbeObject))
      return bar.arch=rv64gcv;
   return bar.default;
}

// Other platform bar.resolver
...

bool isFulFillLinux(LinuxProbeObject Obj) {
   return doLinuxProbe(Obj);
}

bool isFulFillFreeBSD(FreeBSDProbeObject Obj) {
   return doFreeBSDProbe(Obj);
}

// Other platform isFulFill
...
BeMg commented 4 months ago

Relate patch:

https://github.com/riscv-non-isa/riscv-c-api-doc/pull/48 https://github.com/llvm/llvm-project/pull/85790 https://github.com/llvm/llvm-project/pull/85786

BeMg commented 4 months ago

cc @kito-cheng

topperc commented 4 months ago

Is two word "FullFill" supposed to be the single word "Fulfill"?

topperc commented 4 months ago

Do we intend to support __builtin_cpu_supports which is built on the same interface as function multiversioning on other targets like X86? That will require a reasonably fast query mechanism. String processing may be too much for that.

BeMg commented 4 months ago

Is two word "FullFill" supposed to be the single word "Fulfill"?

Oops, I think there is a typo here. Updated.

BeMg commented 4 months ago

Do we intend to support __builtin_cpu_supports which is built on the same interface as function multiversioning on other targets like X86? That will require a reasonably fast query mechanism. String processing may be too much for that.

If we only allow one extension each time. Does it provide a reasonably fast query mechanism? Or must it be some kind of bit operation to determine support?

For example, compiler generate this resolver function base on __builtin_cpu_supports. And compiler-rt/libgcc use the method 1 to implement __builtin_cpu_supports.

ptr bar.resolver() {
   if (__builtin_cpu_supports("i") && 
       __builtin_cpu_supports("m") && 
       __builtin_cpu_supports("a") && 
       __builtin_cpu_supports("f") && 
       __builtin_cpu_supports("d") && 
       __builtin_cpu_supports("c") && 
       __builtin_cpu_supports("v") && 
       __builtin_cpu_supports("zicsr") && 
...
       __builtin_cpu_supports("zvl64b"))
      return bar.arch=rv64gcv;
   return bar.default;
}
topperc commented 4 months ago

Do we intend to support __builtin_cpu_supports which is built on the same interface as function multiversioning on other targets like X86? That will require a reasonably fast query mechanism. String processing may be too much for that.

If we only allow one extension each time. Does it provide a reasonably fast query mechanism? Or must it be some kind of bit operation to determine support?

For example, compiler generate this resolver function base on __builtin_cpu_supports. And compiler-rt/libgcc use the method 1 to implement __builtin_cpu_supports.


ptr bar.resolver() {

   if (__builtin_cpu_supports("i") && 

       __builtin_cpu_supports("m") && 

       __builtin_cpu_supports("a") && 

       __builtin_cpu_supports("f") && 

       __builtin_cpu_supports("d") && 

       __builtin_cpu_supports("c") && 

       __builtin_cpu_supports("v") && 

       __builtin_cpu_supports("zicsr") && 

...

       __builtin_cpu_supports("zvl64b"))

      return bar.arch=rv64gcv;

   return bar.default;

}

My concern is that each time you pass a string into the compiler-rt interface, it will need to execute multiple strcmps to compare the input string against every extension name the library knows about to figure out which extension is being asked for. That gets expensive if called very often.

On x86, builtin_cpu_supports calls the library the first time to update some global variables. After the first time it is a load and a bit test

jrtc27 commented 4 months ago

If you use a sensible data structure like a trie you can do it linearly in the length of the input string

BeMg commented 3 months ago

To enhance both the performance(compare to string base) and portability(compare to hwprobe base), I have updated the runtime interface with a new layer for each queryable extension. This approach is similar to approach 2 described in the PR's description. This comment aims to explain it with a concrete example using the IFUNC resolver function and __builtin_cpu_supports.

Two structures are defined in the runtime library to store the status of hardware-enabled extensions:

Each queryable extension has a unique position inside the structure bit to represent whether it is enabled. For example: extension m enable bit could be stored inside __riscv_feature_bit.features[0] & (1 << 5)

struct {
    unsigned length;
    unsigned long long features[MAXLENGTH];
} __riscv_feature_bit;

struct {
    unsigned vendorID;
    unsigned length;
    unsigned long long features[MAXLENGTH];
} __riscv_vendor_feature_bit;

Additionally, there is a function to initialize these two structures using a system-provided mechanism:

void __init_riscv_features_bit();

In summary, this approach uses __riscv_feature_bit and __riscv_vendor_feature_bit to represent whether an extension is enabled. They are initialized by __init_riscv_features_bit. Both structures are defined in compiler-rt/libgcc.


When the compiler emits the IFUNC resolver function, it can use these structures to check whether all extension requirements are fulfilled.

Here is a simple example for a resolver:

; -target-feature +i
__attribute__((target_clones("default", "arch=rv64im"))) int foo1(void) {
  return 1;
}
func_ptr foo1.resolver() {
    __init_riscv_features_bit();
    if (MAX_QUERY_LENGTH > __riscv_feature_bits.length)
        raise_error();

    // Try arch=rv64im
    unsigned long long rv64im_require_feature_0 = constant_build_during_compiation_time();
    unsigned long long rv64im_require_feature_1 = constant_build_during_compiation_time();
    ...
    if (
    ((rv64im_require_feature_0 & __riscv_feature_bits.features[0]) == rv64im_require_feature_0) &&
    ((rv64im_require_feature_1 & __riscv_feature_bits.features[1]) == rv64im_require_feature_1) &&
    ...)
        return foo1.rv64im;

    return foo1.default;
}
jrtc27 commented 3 months ago

Who's specifying which bit is what?

BeMg commented 3 months ago

My idea is that bit is only meaningful for runtime function and compiler that using __riscv_feature_bits. For function multiversioning, I will allocate non-colliding bits for extensions and remain unchanged. If there is new extension, allocate the available bit or extend the __riscv_feature_bits.features size when it be used by function multiversioning. Vendor extension is guarded by vendorID, so it can be allocated by vendor itself without collosion with other vendor extension.

The remaining problem is how to synchronize the extension bitmask across LLVM, compiler-rt, GCC, and libgcc. I don't have a solution for this yet.

@kito-cheng Any ideas on how we can achieve this synchronization?

BeMg commented 3 months ago

Update: add the extension groupid/bitmask definitions for synchronization across LLVM, compiler-rt, GCC, and libgcc.


cc @kito-cheng @topperc

kito-cheng commented 3 months ago

This proposal got positive feedback from RISC-V GNU community :)

palmer-dabbelt commented 2 months ago

IMO it's way simpler to just have the resolver call hwprobe directly, rather than trying to introduce this intermediate format and the associated library helper functions. We don't even need to specify anything here: the compiler could just generate the hwprobe calls directly and then call into the VDSO via the provided argument to the IFUNC resolver.

That said: this is essentially just duplicating one of the early hwprobe designs, and thus has a bunch of design flaws we spent a few versions sorting out. So if you want to go with it, probably best to sort out things like:

So I'd recommend doing basically nothing here: we already have all the tools we need to implement FMV at the binary/library level, we just need to mark the multi-target attributes as legal so we can implement them.

topperc commented 2 months ago

IMO it's way simpler to just have the resolver call hwprobe directly, rather than trying to introduce this intermediate format and the associated library helper functions. We don't even need to specify anything here: the compiler could just generate the hwprobe calls directly and then call into the VDSO via the provided argument to the IFUNC resolver.

The resolver isn't the only use of this. I'm assuming we should support __builtin_cpu_supports like other targets?

topperc commented 2 months ago
  • What's going to call that initialization function?

On X86, it's called by the resolver function. Only the first call does anything real, the other calls early out if its already been done.

I suggested we should cache the information rather than doing a syscall of hwprobe for every multiversion function.

kito-cheng commented 2 months ago

I am not sure if compiler can generate code to invoke vDSO direct, but this part is like optimization on reducing the overhead of query the capability of host machine, I am kinda less concern around this since current proposal can cache that when first call __init_riscv_features_bit.

For other concern:

We intend to add extension first, and we believe bit mask is enough for now, and our goal is reach same capability as IFUNC in glibc, which we don't intend to address heterogeneous-ISA systems or extensions from multiple vendors yet, and we may extend the syntax on future if needed.

And for IFUNC...I believe there are few security issue around that, but I don't see we have other choice for short-term, both LLVM and GCC are didn't provide such infrastructure without IFUNC, and I am not sure it worth to spend another half year to doing that is worth, also we don't document down we use IFUNC, so we can change the implementation to get rid of IFUNC stuffs in future if we think it's necessary, and the __init_riscv_features_bit and __riscv_feature_bits still can be used once we switch to different implementation.

BeMg commented 1 month ago

Remove the bitmask that can't be query by hwprobe directly. And update Bitpos base on current support extension alphabetical order.

asb commented 1 month ago

A few comments / questions that came to me after taking a closer look at the current implementation in compiler-rt:

I agree with earlier comments that it's a shame this doesn't really have a clean way of adding in performance feature or data that doesn't easily fit in a bitmask. But I appreciate we're at a point where we need to ship something. The extensibility story for RISC-V hwprobe seems better in that respect with the key/value system. I guess the idea is that we'd add another __init_riscv_foo function along with a new struct if necessary, rather than having __init_riscv_feature_bits learn to fill in newly defined structs in the future?

topperc commented 1 month ago

A few comments / questions that came to me after taking a closer look at the current implementation in compiler-rt:

  • What is the intended process for adding to the __riscv_feature_bits bitmask definitions? Will we proactively add extensions to the list as they're ratified? Or wait until someone has a need to probe for them? I note that already there are more extensions in the Linux hwprobe interface than are supported here.

I thought the intent was to match hwprobe. What's missing?

  • How will bitmasks in __riscv_vendor_feature_bits be assigned and managed? Is the idea that every vendor effectively has its own namespace? Also, what is vendorID set to? Is the intent that it matches mvendorid (which is the JEDEC manufacturer ID)

I expect it should be the vendorid returned from hwprobe which I assume matches mvendorid?

topperc commented 1 month ago

A few comments / questions that came to me after taking a closer look at the current implementation in compiler-rt:

  • What is the intended process for adding to the __riscv_feature_bits bitmask definitions? Will we proactively add extensions to the list as they're ratified? Or wait until someone has a need to probe for them? I note that already there are more extensions in the Linux hwprobe interface than are supported here.

I thought the intent was to match hwprobe. What's missing?

I guess these

#define     RISCV_HWPROBE_EXT_ZVE32X    (1ULL << 37)
#define     RISCV_HWPROBE_EXT_ZVE32F    (1ULL << 38)
#define     RISCV_HWPROBE_EXT_ZVE64X    (1ULL << 39)
#define     RISCV_HWPROBE_EXT_ZVE64F    (1ULL << 40)
#define     RISCV_HWPROBE_EXT_ZVE64D    (1ULL << 41)
#define     RISCV_HWPROBE_EXT_ZIMOP     (1ULL << 42)
#define     RISCV_HWPROBE_EXT_ZCA       (1ULL << 43)
#define     RISCV_HWPROBE_EXT_ZCB       (1ULL << 44)
#define     RISCV_HWPROBE_EXT_ZCD       (1ULL << 45)
#define     RISCV_HWPROBE_EXT_ZCF       (1ULL << 46)
#define     RISCV_HWPROBE_EXT_ZCMOP     (1ULL << 47)
#define     RISCV_HWPROBE_EXT_ZAWRS     (1ULL << 48)

Which are not listed here https://docs.kernel.org/arch/riscv/hwprobe.html because that's for kernel 6.10 and the new bits are in 6.11.

lenary commented 1 month ago
  • How will bitmasks in __riscv_vendor_feature_bits be assigned and managed? Is the idea that every vendor effectively has its own namespace? Also, what is vendorID set to? Is the intent that it matches mvendorid (which is the JEDEC manufacturer ID)

Using mvendorid makes most sense to me, especially given we've aligned with misa.

  • The current interface has no way to determine success or failure of __init_riscv_feature_bits. I'd suggest it would be useful to have a way to determine this - even for the function multi-versioning use case it's conceivable a platform might try __init_riscv_feature_bits first and fall back to something else if it wasn't able to retrieve information. But it's also possible we'd want to wrap this function with something more user-facing. I would suggest setting __riscv_feature_bits.length and __riscv_vendor_feature_bits.length to 0 if __init_riscv_feature_bits was unable to extract any information for the current platform.

    • Given at least one extension ('i' if nothing else) should always be set, I suppose checking __riscv_feature_bits.features[0] !=0 would be workable.

This solution seems to me to be a reasonable way forwards - much better than allocating a bit to mean "we initialized the bitmap". This also avoids any issues with the mvendorid actually being zero.

I agree with earlier comments that it's a shame this doesn't really have a clean way of adding in performance feature or data that doesn't easily fit in a bitmask. But I appreciate we're at a point where we need to ship something. The extensibility story for RISC-V hwprobe seems better in that respect with the key/value system. I guess the idea is that we'd add another __init_riscv_foo function along with a new struct if necessary, rather than having __init_riscv_feature_bits learn to fill in newly defined structs in the future?

Performance/metric features (rather than just presence features) are indeed harder, but yes, we do have a route to adding them at some point in the future. This PR seems a good first step, with the additional clarifications you're suggesting.

I think we should probably allocate bits for the hwprobe values that @topperc has listed so we're fully up to date.

jrtc27 commented 1 month ago

How does someone add a vendor extension without having a JEDEC ID then? This is important in the research space, for example.

kito-cheng commented 1 month ago

How will bitmasks in __riscv_vendor_feature_bits be assigned and managed? Is the idea that every vendor effectively has its own namespace? Also, what is vendorID set to? Is the intent that it matches mvendorid (which is the JEDEC manufacturer ID)

I thought the intent was to match hwprobe. What's missing?

I would say that in more neutral way :P the bit will add if any RISC-V extension exploration scheme has supported, give an example: FreeBSD able to detect A, B, C extensions and Linux hwprobe able to detect B, C, D extensions, then we will allocate bits for A, B, C, D

kito-cheng commented 1 month ago

How will bitmasks in __riscv_vendor_feature_bits be assigned and managed? Is the idea that every vendor effectively has its own namespace? Also, what is vendorID set to? Is the intent that it matches mvendorid (which is the JEDEC manufacturer ID)

Yes, vendorID is mvendorid, and then each vendor will has it own namespace, we intend to let each vendor to add that.

kito-cheng commented 1 month ago

How does someone add a vendor extension without having a JEDEC ID then? This is important in the research space, for example.

Define zero or 0x7f7f7f7f as reserved value? each byte is 7 bit data plus one bit odd parity bit, so maybe we can use invalid encoding space in JEDEC ID?

BeMg commented 1 month ago

There are two updates

  1. The function __init_riscv_feature_bits has been updated with an extra parameter. This new argument allows the platform to pass pre-computed results for platform feature information.
  2. A new structure has been defined for CSR-related values (mVendorID, mArchID, mImplID).
BeMg commented 1 month ago

TODO:

  1. Allocate bit for latest hwprobe supported extension
  2. More description/example for vendor feature bit
  3. Mechanism to determine whether __init_riscv_feature_bits executed successfully
BeMg commented 1 month ago

TODO:

  1. Allocate bit for latest hwprobe supported extension

Added and relate LLVM PR https://github.com/llvm/llvm-project/pull/101632