riscv / riscv-bitmanip

Working draft of the proposed RISC-V Bitmanipulation extension
https://jira.riscv.org/browse/RVG-122
Creative Commons Attribution 4.0 International
204 stars 65 forks source link

Provide overflow-detecting arithmetic instructions? #187

Open lyphyser opened 1 year ago

lyphyser commented 1 year ago

One of the biggest shortcomings of RISC-V is the lack of overflow-checked arithmetic, which is essential for reliable software, and is frequently used in Rust, Python and other languages. Obviously it can emulated with sequences of multiple instructions, but that's quite inefficient, since overflow detection is desirable on pretty much all arithmetic instructions.

The bitmanip extension could be a vehicle to add these instructions, since it already seems to carry several "exotic" arithmetic variants.

A potential design would involve adding variants of arithmetic instructions, with the following added fields:

  1. 1 bit: detect signed overflow, or overflowing INT_MAX division result in case of division
  2. 1 bit: detect unsigned overflow", or division by zero in case of division
  3. 1-3 bits: action on overflow/non-overflow
    • Call custom saved handler set in overflow3 CSR (modules would be supposed to save and restore the handler when entering their code and when calling other modules)
    • Set pc += 2 on non-overflow (allows to specify an arbitrary compressed instruction)
    • Call OS-provided handler set in overflow1 CSR
    • Set pc += 4 on non-overflow (allows to specify an arbitrary 32-bit instruction)
    • Call application-provided handler set in overflow2 CSR (only supposed to be used by the main executable)
    • Set pc += 8 on non-overflow (allows to specify two arbitrary instructions)
    • Call temporary handler set in overflow4 CSR (this would be freely overwritable by any non-OS code)
    • Increment fixed overflow_count CSR on overflow (this would be freely overwritable by anyone)

The CSRs would have one set for each operating mode (user, supervisor, etc.), and would be settable in that mode. The temporary/application/library/OS distinction would be merely an ABI convention, and the "OS" would usually be the libc or dynamic loader on POSIX systems.

If enough coding space is available (e.g. if 48-bit instructions are used), then the design can be amended like this:

A similar feature could be provided for floating-point instructions too, with a bit for each of the conditions of the IEEE exceptions:

Separate CSRs could be provided for floating point operations. If there is enough coding space, underflow and/or inexact result could be given an additional set of action fields.

The OS-provided overflow handler should raise the overflow as an exception using the platform's exception-raising mechanism (e.g. DWARF EH or SEH).

A more complicated design is possible, but this might be enough.

Findecanor commented 1 year ago

Most programming languages that need overflow-checking raise an exception when it has happened.

One method for checking overflow that could work for those languages would be to use instructions for saturating arithmetic, which set a cumulative status flag in a CSR if the value is saturated (i.e. it would have overflowed). You would clear the flag before instructions that would set it (write x0 to the CSR), and test the flag (read CSR to GPR; conditional branch) before any result could cause any side-effects (conditional branch, store, etc...) (This is similar behaviour to how QNaNs are propagated by arithmetic instructions, but raise exceptions with those who have side-effects. A proper hardware implementation like this for integers is patented however, and couldn't be added to RISC-V until that patent expires in 2035)

There are many signed and unsigned scalar add, subtract, absolute value, multiply and shift-instructions in the P-proposal. It also includes a CSR with the cumulative overflow-flag. However, there have been discussions to reduce the extension to its SIMD core, omitting these.

Instructions for saturating signed and unsigned addition and subtraction are already in the Vector extension, but it is limited: