stephencelis / SQLite.swift

A type-safe, Swift-language layer over SQLite3.
MIT License
9.57k stars 1.54k forks source link

Reactivate "Performance improvements for Blob" #1167

Closed jberkel closed 1 year ago

jberkel commented 1 year ago

416

stephencelis commented 1 year ago

@jberkel Sorry for the fly-by! And much appreciation for your continual maintenance of this :smile: Just don't want folks to get bit by this potential issue in production.

stephencelis commented 1 year ago

@jberkel I just pulled the 0.14.0 tag to prevent folks from taking this change for now, but please cut 0.14.1 when you have a chance to revert.

jberkel commented 1 year ago

@stephencelis Oh, hello. The PR this is based on sat around unchallenged for six years and now you're appearing here deus ex machina :)

Could you explain a bit more, I don't see the security issue, NSData.init(bytes:length:) always copies the data. The previous code also had

let i8bufptr = UnsafeBufferPointer(start: bytes.assumingMemoryBound(to: UInt8.self), count: length)

so I don't see why this would be less safe.

jberkel commented 1 year ago

Apple's documentation mentions that

Accessing memory through the returned pointer is undefined if the memory has not been bound to T. To bind memory to T, use bindMemory(to:capacity:) instead of this method.

However, bindMemory(to:capacity:) doesn't really do anything to the underlying memory itself. The specification stresses that bound types need to be "layout compatible" for the behavior to be defined.

And as far as I can tell, the elementary type UInt8 should always be layout compatible: there are no alignment problems with single byte access.

stephencelis commented 1 year ago

@jberkel Again, sorry for the fly-by out of nowhere! Please let me take some time to try to flesh out my concerns. I'll start with the one that I think is the most pressing:

  1. Adding an API that returns an unsafe, dangling pointer:

    public var bytes: UnsafePointer<UInt8> {
      data.bytes.assumingMemoryBound(to: UInt8.self)
    }

    Dangling pointers can easily outlive the lifetime of the underlying memory and be deallocated before use. Swift provides scoped methods like withUnsafePointer that guarantee the lifetime of the pointer for the duration of the scope for a reason, but we're not doing the same here. And while NSData may copy the bytes from the original SQL result, I'm more worried that this NSData is the owner of the underlying memory and could deallocate the pointer before a SQLite.swift user interacts with it. This could lead to hard-to-diagnose crashes and other memory issues, and easily become an attack vector in Swift apps using SQLite.swift.

    Also note that Data, the Swift-friendly wrapper around NSData, does not provide an interface to this dangling pointer because it is not safe.

    I think Swift libraries should avoid exposing Unsafe* APIs like this in general, and when they do, the reason should be clearly documented and motivated. This is even a prerequisite of libraries Apple accepts into SSWG incubation. I hope you agree that we should probably follow suit.

  2. Updating the API to use a new data structure for performance improvements without validating the improvements with benchmarks. While it may be true that NSData and Data were anecdotally more performant when the original PR was opened six years ago, I'm pretty sure this is no longer the case today. Swift's performance has improved a lot since then, and while there may be opportunities to improve the library performance, I do think that it should be measured in actual benchmarks. And in the case of this change, I worry that it may have a measurably negative impact on library users.

  3. I don't think we should assume the underlying memory of NSData.bytes without very strong guarantees, and in general I think assumingMemoryBound(to:) should be avoided if there are safer alternatives. While the library may have other uses of assumingMemoryBound(to:), those should probably be audited to something safer, or the use should be clearly motivated/documented.

Hope these concerns are clear and make sense! I appreciate wanting to continually improve the library for its users, but I also want to make sure we don't ship something that could cause unintentional problems for them.

jberkel commented 1 year ago

@stephencelis

Adding an API that returns an unsafe, dangling pointer:

It's only a dangling pointer once the Blob / NSData is deallocated. Like in this case:

func test_dangling_pointer() {
    var unsafePointer: UnsafePointer<UInt8>? = nil
    do {
        var blob: Blob? = Blob(bytes: [42, 41, 40])
        unsafePointer = blob!.bytes
        blob = nil
    }
    XCTAssertEqual(unsafePointer?.pointee, 42)
}

Interestingly, even compiled with "guard malloc" and run with "malloc scribble" this test passes. It looks like the memory isn't freed immediately.

But yes, clearly some "shoot-in-the-foot" potential here.

... without validating the improvements with benchmarks.

That's a fair point, it should have been benchmarked. I'll try to do some benchmarking, more out of curiosity.

I don't think we should assume the underlying memory of NSData.bytes without very strong guarantees

This is probably the weakest argument, as stated above I don't see how this behavior can be undefined according to the Swift memory model.

This PR was rushed into the last release, which wasn't a good idea. On the other hand, it's been around for a few years and nobody challenged it, so I assumed it was ok to include.

I'll do a new release soon, don't have much time now because I'm travelling.

stephencelis commented 1 year ago

It's only a dangling pointer once the Blob / NSData is deallocated

Sure, but APIs that return dangling pointers make it very easy for folks to make this mistake by holding onto the pointer and letting the data be deallocated. I think the job of a high-level library is to prevent folks from making these kinds of mistakes by not shipping interfaces that make it easy.

This is probably the weakest argument, as stated above I don't see how this behavior can be undefined according to the Swift memory model.

I'd like to be a little more cautious here as I don't see how we can guarantee it. We know that using unsafe interfaces can cause problems at runtime when used incorrectly. Might be worth asking the compiler engineers on the Swift forums if it is safe to use this API here and, if so, document the findings?