Fix known leaks in both swift and android SDKs

thomcc commented 6 years ago

This was easier for the Swift one than the Android one, although I'm more confident that the android one is completely good now -- In the swift, I'm not 100% certain that TypedValues are being cleaned up, since we should have seen some of the same crashes that I did in android when things were freeing values returned by value_at_index.

Anyway, the Android one also has several changes that were needed in order to make it less likely to cause issues in the future. I'd like to go further here and make the rawPointer on RustObject private, but I stopped before doing so.

thomcc commented 6 years ago

In general, I have a couple fears about the FFI code that I'm going to take this comment to express:

Manual memory management in languages that don't help you out is fairly error prone, and getting it wrong can lead to double frees or use-after-frees, both of which lead to security vulnerabilities (and are the types of things that Rust was designed to avoid).

The changes I've made don't help here⁰ -- leaking memory isn't unsafe (look at std::mem::forget) but once you stop leaking the memory you open yourself up to all sorts of nastiness. Unfortunately, I don't think it's realistic to think we could ship while leaking memory very frequently.

My biggest fear is that one of our consumers (Fenix? Lockbox?) ends up with a security vulnerability due to some memory safety error in the FFI boundary. I think this would be extremely embarrassing for all involved (especially Mentat), and is completely possible.

The only solution I can think of is for us to run with either something like msan/asan/valgrind (which IDK if you can use for e.g. Java on Android), or possibly a debug heap that tracks that you're calling all the correct destroy_* functions exactly once, ensures nothing that we didn't allocate ever is freed with them, and that we never access memory that has been freed (e.g. it unmaps the pages where items were located after they're freed).

Even this would only be a half solution though, since it wouldn't catch calling a function other than a destructor with the wrong type. This would still be an exploitable memory safety error. ExternResult¹ makes this sort of thing sound very plausible to me, and the Android code's heavy use of Pointer even more so².

0: Well, some of them do. Removing heap allocated ExternResults, should make things a bit better. Same goes for some of the changes made to the android FFI, which make an effort to prevent double frees or use after frees by making sure you discard RustObject's rawPointer after it's freed.

1: Presumably, doing something like #[repr(C)] struct ExternResult<T> { ok: *mut T, err: *mut c_char } is our best option, especially since we hand write our .h/JNA.java files. We'd then write FooResult, and BarResult for each ExternResult<Foo>/ExternResult<Bar> in the store.h.

2: I gather the way to get more type safety for some of this is to define PointerTypes? JNI also is a bit better here since we'd be implementing a lot more of the bindings in Rust, where we could be confident about not making mistakes.

thomcc commented 6 years ago

The latest two commits are a very different approach with the following benefits:

The type returned by FFI functions are now explicit. This mitigates issues where the documentation claims something returns an ExternResult that holds one type, but it actually holds another type. This was especially error prone given that we would convert from Result<T, E> to ExternResult with into(), which means we did so without ever typing out the actual return type!
Java is now much more type safe, while still using JNA. This was actually not as hard as I had feared. If we use this technique it should go a long way towards preventing passing pointers to functions that take different types.
The Swift SDK is still not type safe, but it's no worse than before. I will admit the ergonomics did suffer here, but I think benefits 1 and 2 make up for it. I had hoped that I'd be able to have an opaque T in a Unsafe{,Mutable}Pointer<T>, but no such luck.

I still haven't responded to all review comments

fluffyemily commented 6 years ago

Now, you can cast an OpaquePointer to an typed UnsafePointer in Swift. I don't know what the result of this would be here, but I can experiment:

func convertToType<T>(pointer: OpaquePointer, type: T) -> UnsafePointer<T> {
    return UnsafePointer<T>(pointer)
}

thomcc commented 6 years ago

I think we can do something with typing in Swift. Possibly involving better annotations in the C header to give the opaque pointers a named type value and then casting to an UnsafePointer of that type. I can take a look at that if you want to move on though. As you say, it's no worse than it was before.

...

Now, you can cast an OpaquePointer to an typed UnsafePointer in Swift. I don't know what the result of this would be here, but I can experiment...

The issue is there's no T here, since it was an incomplete type in C. I thought about doing struct Foo {}; in the header, which would let us use UnsafePointer<T>/UnsafeMutablePointer<T>, and we could have all the types return the real pointer types and get type safety. Then swift would be back to having more type safety than the java code, since it understands const/non-const.

The reason I didn't do this is because it's very much not the layout of these structs, and it would allow swift code to create values of these types without getting them through from rust. Doing so would always be a bug.

There might be a way to signal this appropriately like struct Foo { void *_internal_fake_data_; } or something though, where if you do Foo(_internal_fake_data_: UnsafeRawMutablePointer(bitPattern: 0)) (or something similar) in Swift you know you're doing something very dodgy. I'd be fine with this, but it would be a strange pattern.

Edit: I've tried this out here. I'm willing to go to this approach if you think the type safety is worth it -- I think it is if we can be sure consumers aren't creating RawBlah instances, but IDK if we can (aren't they available to users who import MentatStore? My testing says yes. -- We always could s/MentatStore/MentatPrivate/ to discourage people from doing this though). Even if we can, I'm not sure that this would be the right recommendation to give to developers, especially ones without much experience writing the sort of low level code that the Rust FFI requires.

On the other hand, doing that conversion did point out an additional memory leak (we were destroy()ing something that had a more complex type) and a place where the header had the wrong type in a function declaration.

thomcc commented 6 years ago

So, some further thinking is that while the "type safe swift" approach I linked in the previous comment (e.g. https://github.com/thomcc/mentat/tree/type-safe-swift, but the only difference is the single patch https://github.com/thomcc/mentat/commit/85322bebd2256c905e795c075cca167140f8f5a0) has some drawbacks (If you create a RawBlah(_internal_fake_data: ...) from swift it will be a memory safety issue basically 100% of the time), this actually seems much easier to audit and prevent than ensuring everybody passes their OpaquePointers to the correct functions.

If we do this we'll have type safety from Swift and Java, with the caveat that it depends on getting the declarations in the store.h/JNA.java equivalents correct. This is a big caveat, since we had mistakes in ours, but it also seems a lot easier to address with tooling.

mozilla / mentat

Fix known leaks in both swift and android SDKs #745