Behavior of masked fault-only-first loads

riscvarchive / riscv-v-spec

Working draft of the proposed RISC-V V vector extension

https://jira.riscv.org/browse/RVG-122

Creative Commons Attribution 4.0 International

973 stars 272 forks source link

Behavior of masked fault-only-first loads #898

Closed dzaima closed 1 year ago

dzaima commented 1 year ago

What is the expected behavior of a fault-only-first load, where the first element is not active?

By my reading of the spec, element 0 shouldn't be read, and thus shouldn't fault, and thus the whole instruction would never fault. That is also the behavior that would be required to be able to use it for vectorizing a conditional load in a loop with an early exit.

But if this is the case, it might allow probing accessibility of any memory location via a mask of [0,1,...], making unit-stride fault-only-first loads as much of a security concern as the hypothetical indexed fault-only-first loads which are said to "represent a larger security hole" further down the spec.

(A possible way for an implementation to not allow such would be to, if the first mask item is zero, set the new vl to the number of leading zeroes, thus not accessing any memory. But this can harm throughput in legitimate cases.)

nick-knight commented 1 year ago

it might allow probing accessibility of any memory location via a mask of [0,1,...], making unit-stride fault-only-first loads as much of a security concern as the hypothetical indexed fault-only-first loads which are said to "represent a larger security hole" further down the spec.

My understanding is that the concern with the strided or indexed cases is that an attacker could probe a much larger portion of the address space (one element per VM page, say), whereas a single unit-stride access can only probe up to VLEN consecutive bytes of memory. (I'm not a security expert, this is just my high-level recollection as a participant in the task group.)

dzaima commented 1 year ago

That'd be only a difference in throughput then, which I wouldn't imagine matters in any way in relation to security. (not a security expert either though)

The unit-stride versions only allow probing a region immediately contiguous to a known region, and so reduce the security impact when used in unprivileged code

in the spec to me still reads like it's not considering the possibility of masked-off start, as with it the probable regions are not constrained (provided the implementation doesn't do my described workaround or similar).

aswaterman commented 1 year ago

Ultimately, it is about throughput, since you can accomplish this without the V extension by installing a SIGSEGV handler and using regular loads. Or, if you have access to a high-precision time source, you may be able to use a speculative timing attack by putting the potentially faulting scalar load in the shadow of a mispredicted branch (provided the TLB doesn't cache inaccessible pages, at least).

(Also note that setting vstart to 1 can accomplish this for non-masked versions.)

dzaima commented 1 year ago

Alright; given that my understanding of the expected behavior is supposedly correct then, closing the issue.