How to discover the information of cache block size by software?

Jamesykm-andes commented 2 years ago

In 2.7. Software Discovery section, some information needs to be discovered by software. How to discover the information by software? Should we record them into specified registers? Or, should SW know these information in advance based on the cache configuration?

gfavor commented 2 years ago

The new (in development) "RISC-V unified low-level discovery method" will support discovery of lots of information - such as what arch extensions are supported, what optional extension features are supported, what parameters are supported, etc. In Linux-class systems, for example, this would be employed during boot by M-mode software, and some of that information may then be populated into Device Tree or ACPI tables that are passed to the OS.

Cache config info readily falls into this category.

kito-cheng commented 2 years ago

I run into problem when I tried to implement __builtin___clear_cache with cbo.flush, that require know the block size so that we can know cbo.flush should execute how many times, and I realized there is no way to implement that until we have "RISC-V unified low-level discovery method" and OS support...

ubc-guy commented 2 years ago

in the vector spec, software needs to know the maximum length of a vector that can be used when stripmining. this varies greatly (more than cache line size) among implementations. hence, the 'setvl' instruction accepts an 'application vector length' as a request, and returns a value in 'rd' which saturates to the largest vector length the underlying implementation can support.

the cbo. and prefetch. instructions can potentially be modified to do this, since they presently force the 'rd' field as all zeros, but it would be a huge waste of opcode encoding space.

the other option is to implement a new instruction that returns the cache line size. i'm a bit surprised this was left out of the final spec, since it was talked about at one point. any such instruction would have to return the smallest line size used when there are multi-level caches.

adding a cache line size instruction might be possible via the fast track process. https://riscv.org/announcements/2021/02/risc-v-international-unveils-fast-track-architecture-extension-process-and-ratifies-zihintpause-extension/

cmuellner commented 2 years ago

We were discussing that as part of the kernel support for CBO. It was agreed to provide this information via DTB as specified here: http://lists.infradead.org/pipermail/linux-riscv/2022-May/014886.html

The interface for userspace is still not defined.

Also, note that the CBO granularity does not necessarily need to be equal to the cache line size.

kito-cheng commented 2 years ago

@cmuellner thanks for the info, I didn't realized that CBO operation size might different from the actual cache line size, seems like I should stop to implement __builtin___clear_cache at this moment until that is settle down and having interface for userspace.

cmuellner commented 2 years ago

The CBO spec says: """ 2.7. Software Discovery The initial set of CMO extensions requires the following information to be discovered by software: • The size of the cache block for management and prefetch instructions • The size of the cache block for zero instructions """

So, we have to further differentiate between the cbom/cbop size and the cboz size.

dkruckemyer-ventana commented 2 years ago

FWIW, this profile option, Zic64b, also indicates the cache block size: https://github.com/riscv/riscv-profiles/issues/37

gfavor commented 2 years ago

My understanding of the intent of the RVA ISA Profiles is that Linux distros will target an RVA profile. At some point going forward that will be RVA22. Once distros target RVA22, then the Zic64b mandate from the profile will mean that a distro can assume 64B for the two CBO block sizes. Would that still mean that standard RISC-V Linux has to support dynamic discovery of the two block sizes (given that they can only be 64B in RVA-compliant distros and hardware), i.e. that these extra builtin's need to be implemented?

cmuellner commented 2 years ago

A plausible and sensible scenario would be that upstream will support all the RVA22 extensions, plus the supported/existing HW before RVA22, plus any non-RVA22 HW with a reasonably large user base (assuming that somebody will care enough to work on the required patches).

The __builtin___clear_cache builtin is part of GCC (flush the processor's instruction cache) and needs to be implemented as JIT compilers might depend on it.

One way to solve this for all CBO sizes would be:

    ld a1, 0(riscv_cbom_block_size)
[...]
    mv a0, $start
    j 2f
3:
    $cmo_operation (a0)
    add a0, a0, a1
2:
    bltu a0, $end, 3b

That's also how Heiko's kernel patch implements the CBO support.

brucehoult commented 2 years ago

My understanding of the intent of the RVA ISA Profiles is that Linux distros will target an RVA profile. At some point going forward that will be RVA22.

As far as I know, x86_64 Linux distributions still support the original 2003 Athlon 64s. They don't assume anything like AVX or even the BMI1/BMI2 (Intel) or ABM (AMD) bit manipulation extensions.

RVA20 aka RV64GC should be a supported baseline forever.

riscv / riscv-CMOs

How to discover the information of cache block size by software? #48