riscv / riscv-zalasr

The ISA specification for the Zalasr extension.
Creative Commons Attribution 4.0 International
1 stars 1 forks source link

2*XLEN loads and stores #6

Closed sorear closed 7 months ago

sorear commented 7 months ago

There is a clear unmet software need for atomic 2*XLEN loads anywhere that the AMOCASQ PMA exists. The solution for the RVA profile is likely to involve the V extension in some capacity, but this solution is not suitable for non-RVA profiles where the V extension is not implemented. Zilsd could also be used, but Zilsd is a specialized extension which uses a considerable amount of opcode space that is only desirable in some profiles.

The natural way to encode a 2*XLEN load or store without an immediate is by extending this proposal to allow width, xlen combinations of DOUBLE, 32 and QUAD, 64 if a suitable PMA exists. Much like Zacas, the double-length operations are effectively optional if the PMA is not supported for any address.

Is there any interest in this?

mehnadnerd commented 7 months ago

I don't think this extension is the right vehicle for this. It is focused on quickly fixing the issues we have in lowering atomic load/stores and implementing the A.7 mappings from the ISA manual, and I don't want to add things that could delay it. Especially given that int128 support in general is shaky (I don't believe the ABI is worked out), the timescales don't match up in my opinion.

I'm also not sure if load-acquire/store-release is the right vehicle for building large atomic accesses in general. I don't like register pairs, since they complicate decode and rename. Particularly annoying is that it means the decode changes significantly based on the size--the largest size no longer just writes one register. I don't think that there will be a large amount of systems where there is support for large CAS but not vector--the natural implementation of large CAS would seem to me to use the vector load/store operations.

If there is a strong desire for these large atomic loads/stores, a follow-up extension should still be compatible. Such a follow-up extension may want to also relax the requirement that one of the ordering bits is set, so you can do relaxed large atomic loads/stores rather than having to make them acquire/release simply for encoding reasons. Such an extension would be fully backwards compatible because those encodings are not used with the current proposal.