iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.85k stars 615 forks source link

Audit iree_host_size_t/iree_device_size_t and use saturating math #3844

Open benvanik opened 4 years ago

benvanik commented 4 years ago

Something like https://github.com/StefanHamminga/saturating is the right shape (we'd refactor for C with C++ wrappers). Chrome has some stuff too though it's a bit larger than what we need: https://github.com/chromium/chromium/tree/master/base/numerics#calculating-a-buffer-size-checked-arithmetic

Basically, almost every operation on one of these size types should be saturating (vs the current overflow behavior and weirdness around mixed signed/unsigned operations - and weird signed overflow undefined behavior). This isn't the datacenter - we can't #yolo it with int everywhere like in google and hope we can hotpatch prod servers when a vuln is found (or we want larger buffers).

https://github.com/dcleblanc/SafeInt is what I used back in the day. I survived the msft SAL/safe-int code red, and once you get used to them it eliminates all kinds of annoying bugs - let alone security issues.

Note that we only really care about this in the runtime; the compiler can be #yolod because we will never ship it to places where we can't easily update it.

bjacob commented 3 years ago

The above description talks about both saturation and checking for overflow. The need for checking for overflow is clear if security is at stake or more generally if for some reason we are going to have overflows. But I don't understand the need for saturation? What would be a scenario where saturation allows IREE to proceed in a useful way?

Should this issue be blocked by first defining a security model -- which inputs are trusted, which are untrusted? Otherwise, do we risk complexifying code in some direction before knowing exactly which direction we want.

benvanik commented 3 years ago

Ah, in the before-times things like the old C SafeInt used "saturating" as synonymous with "error" - so you'd perform a "saturating add of pointer + size" and if it saturated that was a failure. Back at microsoft there were teams that used "saturation" => "throw exception" and others (ones I was on) that used "check for saturation and if the boolean result was set then bail" and avoided exceptions. The latter teams were where all the gamedev/realtime interactive stuff was happening, obviously :)

To be clearer: underflows/overflows on the host side are an error, and in most places in IREE outside of the actual generated code we can propagate those errors back to users in meaningful ways. Inside of the generated code we can either use robustness semantics ("accessing out of bounds always returns 0/ignores writes") or device loss semantics ("you tried to do something bad so I just killed all your work and now you have to start over") depending on the backend.

bjacob commented 3 years ago

Thanks for the explanation! Can you elaborate a bit on the motivation - are we considering doing this for some kind of security purpose or just to be more helpful to the programmer? The above special-casing of host-side code / "outside of the actual generated" code sounds like we'd be doing this just in order to be helpful, but not for a security purpose, for if we were doing this for a security purpose then we would have to enforce this all the way down to device/generated code - right ?

benvanik commented 3 years ago

It's both :) This particular issue is about always being sure that the byte ranges we pass along with buffers are valid. If they can be invalid at a particular layer of the stack then they are implicitly impossible to validate lower in the stack - it's all about a chain of trust. If, for example, the bytecode VM can introduce invalid byte ranges into the HAL host-side API, then the HAL cannot perform any kind of validation and the whole thing becomes untrusted. If we can ensure that doesn't happen but then the HAL can take the validated ranges and make them invalid as it performs various range manipulation (by overflow/etc) then the HAL device-side API cannot perform any validation. etc etc. The goal here is to have a defense-in-depth where if we ever have a handle to a buffer (a HAL handle, a Vulkan handle, a void*, etc) we always have a valid range that is within the bounds of the buffer. Once on the device and executing the generated (or user-provided) executables the contract is that the buffer ranges provided to it are valid and depending on the implementation those ranges could be used or not. For example, if running in wasm where the entire memory space is sandboxed and it's impossible to access anything outside of the wasm heap we may do nothing at all. On an accelerator with robustness features baked in we can rely on the hardware to do the defined load/store checks using its MMU/etc. On accelerators/systems where we cannot rely on an MMU we can insert guards that verify the loads/stores in software (ala pnacl/wasm/emulated robustness on GPUs). There's also shades here of safety from mistakes vs safety from malicious intent: if an app is shipping a statically-compiled and linked model in its own binaries there's no risk of malicious intent (at the point that you could modify the .so in an apk all bets are off) but there's always the case of the model code being bad all the way up in python (accesses driven by gathers that index out of bounds, sparse tensors that indirect out of bounds, etc), app code usage of the APIs to interact with IREE, etc etc.

So: it should not be possible to introduce buffer ranges that are out of bounds of the buffer at any layer of the stack. There are fallbacks in case we miss those like the device-side robustness features, and just as WebGL performs buffer validation but also requires robustness be enabled for drive-by arbitrary code execution so too may we if an app allows you to load untrusted inputs. Otherwise it's just safety against bugs and better error reporting.