Crash on Xcode 16.0 beta 2 when calling `MLXLLM.generate`

DePasqualeOrg commented 4 months ago

When I call MLXLLM.generate in my app running in Xcode 16.0 beta 2, the app crashes on macOS and iOS with the C++ error vector[] index out of bounds. This does not happen in Xcode 15. It appears to be related to the C++ libraries in macOS 15 and iOS 18. I'm running macOS 14 and iOS 17 on my devices.

Unfortunately I can't reproduce this in the example app in this repo, but this is how I'm calling MLXLLM.generate. I'm storing a reference to a Task that can be canceled and whose result can be awaited. I'll try to put together a minimal reproduction based on this repo soon.

await MainActor.run {
  generateTask = Task {
    let submitTime = timestampNow
    let (model, tokenizer) = try await loadTask.value
    let promptTokens = tokenizer.encode(text: prompt)
    // Use random seed for non-deterministic generation
    MLXRandom.seed(UInt64(Date.timeIntervalSinceReferenceDate * 1000))
    timeToFirstTokenMs = nil
    tokensPerSec = nil
    var startTime: Int64?
    print("::: Calling MLXLLM.generate")
    // !! Fatal error occurs when calling MLXLLM.generate: "vector[] index out of bounds"
    // "An abort signal terminated the process. Such crashes often happen because of an uncaught exception or unrecoverable error or calling the abort() function."
    let result = await MLXLLM.generate(
      promptTokens: promptTokens,
      parameters: parameters,
      model: model,
      tokenizer: tokenizer,
      extraEOSTokens: modelOption.extraEOSTokens
    ) { tokens in
      if timeToFirstTokenMs == nil {
        timeToFirstTokenMs = timestampNow - submitTime
      }
      // Set start time on first token generation (don't count time to load model and prompt in stats)
      if startTime == nil {
        startTime = timestampNow
      }
      // Update the output. This will make the view show the text as it generates.
      if Task.isCancelled {
        return .stop
      }
      if tokens.count % self.displayEveryNTokens == 0 {
        let text = tokenizer.decode(tokens: tokens)
        if let startTime, tokens.count > 1 {
          let elapsedTimeSec = Double(timestampNow - startTime) / 1000
          if elapsedTimeSec > 0 {
            self.tokensPerSec = Double(tokens.count - 1) / elapsedTimeSec
          }
        }
        await onUpdate(text)
      }
      if tokens.count >= maxTokens {
        return .stop
      } else {
        return .more
      }
    }
    // Final update
    await onUpdate(result.output)
    self.tokensPerSec = result.tokensPerSecond
    return result.output
  }
}

ios macos

awni commented 4 months ago

I think this is related to https://github.com/ml-explore/mlx-swift/issues/104 and https://github.com/ml-explore/mlx-c/issues/30. Let's wait until after MLX Swift is updated to the latest MLX (should be in the next few days) and see if the problem persists.

DePasqualeOrg commented 4 months ago

https://github.com/ml-explore/mlx-swift/issues/104#issuecomment-2229226417

ml-explore / mlx-swift-examples

Crash on Xcode 16.0 beta 2 when calling `MLXLLM.generate` #86