matrix-org / matrix-rust-sdk

Matrix Client-Server SDK for Rust
Apache License 2.0
1.19k stars 236 forks source link

FFI bindings crash on iOS in RoomListItem.initTimeline when built in debug mode #4009

Closed Johennes closed 6 hours ago

Johennes commented 2 days ago

I'm hitting a weird EXC_BAD_ACCESS with the FFI bindings on iOS. I'm using the following example code:

import UIKit
import AuthenticationServices

var client: Client! = nil
var ssoHandler: SsoHandler! = nil
var syncService: SyncService! = nil
var listener: AllRoomsListener! = nil
var roomListService: RoomListService! = nil
var handle: TaskHandle! = nil

var asSession: ASWebAuthenticationSession! = nil

class ViewController: UIViewController {

    override func viewDidLoad() {
        super.viewDidLoad()

        Task {
            try Thread.sleep(forTimeInterval: 2)

            client = try! await ClientBuilder()
                .serverNameOrHomeserverUrl(serverNameOrUrl: "https://chat.dev.unomed.ch")
                .sessionPaths(dataPath: URL.applicationSupportDirectory.path(percentEncoded: false).appending("/").appending(UUID().uuidString),
                              cachePath: URL.cachesDirectory.path(percentEncoded: false).appending("/").appending(UUID().uuidString))
                .slidingSyncVersionBuilder(versionBuilder: .proxy(url: "https://ssync.dev.unomed.ch"))
                .build()

            ssoHandler = try! await client.startSsoLogin(redirectUrl: "test.matrix://Main", idpId: nil)

            let url = URL(string: ssoHandler.url())!
            asSession = ASWebAuthenticationSession(url: url, callbackURLScheme: "test.matrix") { callbackURL, error in
                Task {
                    try! await ssoHandler.finish(callbackUrl: callbackURL!.absoluteString)

                    let session = try client.session()
                    print(session)

                    // Create a sync service which controls the sync loop.
                    syncService = try await client.syncService().finish()

                    // Listen to room list updates.
                    listener = AllRoomsListener()
                    roomListService = syncService.roomListService()
                    handle = try await roomListService.allRooms().entries(listener: listener)

                    // Start the sync loop.
                    await syncService.start()
                }
            }
            asSession.presentationContextProvider = self
            asSession.start()
        }
    }
}

extension ViewController: ASWebAuthenticationPresentationContextProviding {
    func presentationAnchor(for session: ASWebAuthenticationSession) -> ASPresentationAnchor {
        return view.window!
    }
}

class AllRoomsListener: RoomListEntriesListener {

    /// The user's list of rooms.
    var rooms: [RoomListItem] = []

    func onUpdate(roomEntriesUpdate: [RoomListEntriesUpdate]) {
        // Update the user's room list on each update.
        print(roomEntriesUpdate)
        for update in roomEntriesUpdate {
            switch update {
            case .reset(values: let values):
                rooms = values
            case .append(values: let values):
                rooms.append(contentsOf: values)
            case .pushBack(value: let value):
                rooms.append(value)

                if !value.isTimelineInitialized() && (try? roomListService.room(roomId: value.id())) != nil {
                    Task {
                        try! await Task.sleep(nanoseconds: NSEC_PER_SEC * 30)
                        do {
>>>                         try await value.initTimeline(eventTypeFilter: nil, internalIdPrefix: nil)
                            print(value.id())
                        } catch {
                            print(error)
                        }
                    }
                }
            default:
                break // Handle all the other cases accordingly.
            }
        }
    }
}

I thought it's a race condition initially which is why I added some sleep statements but that didn't help. The app crashes when calling pollFunc inside initTimeline.

Interestingly it only crashes when I build the bindings in debug mode but not when using --release. I'm using the following command to build the SDK:

unset SDKROOT && cargo xtask swift build-framework --target aarch64-apple-ios-sim (--release)

Screenshot 2024-09-16 at 20 22 33

zzorba commented 1 day ago

I'm seeing similar behavior with a recent version of this library, a bad access crash in initTimeline with debug builds.

@Johennes are you seeing this on simulator, or only with real devices? I've struggled to make this reproduce on iOS simulator.

Johennes commented 1 day ago

Oh, interesting. I have not tried it on a device yet but I can consistently reproduce the crash on a simulator. I'm on an Apple Silicon Mac in case that makes any difference.

zzorba commented 1 day ago

Sorry, I've been chasing two crash bugs, let me try to make this one reproduce on simulator and I will update what I find.

Edit: yeah, I seemingly can only make this happen on a real device (not on the simulator).

Johennes commented 1 day ago

Ok, I see. I originally reproduced the crash in https://github.com/unomed-dev/react-native-matrix-sdk and thought that it might be due to React Native or the glue code I had added to connect the bindings. But then it still showed when using the bindings in a minimal non-RN example. It's wild that it crashes on different platforms for the two of us. 🤔

zzorba commented 1 day ago

Interesting, we are also working to bring the matrix-rust-sdk to react-native, though we took the approach of sponsoring a uniffi library that is capable of generating react-native bindings directly. We plan to release the OSS tool very soon, just wrapping up the documentation and working out some final bugs.

Thank you for identifying that this bug was happening in the vanilla library, I was going slowly mad trying to figure out what about our environment was triggering it.

Johennes commented 1 day ago

Interesting, we are also working to bring the matrix-rust-sdk to react-native, though we took the approach of sponsoring a uniffi library that is capable of generating react-native bindings directly. We plan to release the OSS tool very soon, just wrapping up the documentation and working out some final bugs.

Oh, amazing! Is there a place I could sign up to be notified about the release? Wrapping the FFI bindings manually has worked relatively well so far but generating the entire RN module would be a lot better. If this works, I won't have to continue on my current track anymore.

I was going slowly mad trying to figure out what about our environment was triggering it.

Me too. 😅

stefanceriu commented 1 day ago

Interestingly it only crashes when I build the bindings in debug mode but not when using --release

We've first saw these crashes within the crypto crate and on the timeline years ago. The solution was to use a custom build profile. It's probably a bit slower to compile but Element X is using it and we haven't had any problems since.

Johennes commented 1 day ago

Wow! 😮

Would it make sense to make reldbg the default profile in https://github.com/matrix-org/matrix-rust-sdk/blob/main/xtask/src/swift.rs? It currently seems to default to dev but I suppose there's not a lot you can do with the bindings if they crash on initTimeline.

stefanceriu commented 1 day ago

Would it make sense to make reldbg the default profile

I think it would, yes! Can you please see if it fixes the problem on your side and then perhaps raise a PR for it?

Johennes commented 17 hours ago

Can you please see if it fixes the problem on your side and then perhaps raise a PR for it?

Looks like using reldbg fixes the crashes I'm seeing. 🎉

Have opened https://github.com/matrix-org/matrix-rust-sdk/pull/4020.

stefanceriu commented 6 hours ago

Nice, it's merged now so let's close this ticket.