ml-explore / mlx-swift

Swift API for MLX
https://ml-explore.github.io/mlx-swift/
MIT License
526 stars 40 forks source link

LLDB hanging when doing po on mlx-swift objects #97

Open derekelewis opened 2 months ago

derekelewis commented 2 months ago

I am using LLDB to debug some model code that I have written in mlx-swift. Unless I do an explicit eval() or print() before doing a po on the object in question, LLDB will hang. Is this behavior expected with lazy evaluation?

davidkoski commented 2 months ago

It might be -- po might run the code by activating a single thread (all threads are stopped in the debugger). Evaluating a lazy MLXArray would require submitting tasks and running the graph on another thread. Very likely that other thread is kept paused and this would be the result.

awni commented 2 months ago

@davidkoski is there a way to have the array actually get evaluated when you call po in the debugger? In Python, if you inspect an array in pdb the __repr__ method is called. We just eval and print the array in the method. I assume there is a similar idiom in Swift?

davidkoski commented 2 months ago

That is exactly what po does -- it calls the description method on MLXArray which calls mlx_tostring() in the C layer and that calls:

  std::ostringstream os;
  os << ctx; // ctx is mlx::core::array
  std::string str = os.str();
  return new mlx_string_(str);
awni commented 2 months ago

Right.. so I guess I don't understand why it hangs with the implicit eval but not with the explicit eval. Shouldn't they both be running on a separate thread?

derekelewis commented 2 months ago

The odd thing is that I can pretty much po anything in a model running LLMEval, but something very basic like a simple Swift program that initializes an array with MLXRandom.normal and do a po on that, then there is a hang w/ LLDB. Here's my Whisper implementation so far - please ignore the very basic WhisperEval app. Also, transcribe, decode, and tokenizer haven't been implemented, yet. I am just making sure the logits match up in each module against the mlx-examples reference implementation.

https://github.com/derekelewis/mlx-swift-examples/tree/whisper-example/Libraries/Whisper https://github.com/derekelewis/mlx-swift-examples/tree/whisper-example/Applications/WhisperEval

derekelewis commented 2 months ago

Here's a simple, reproducible example that I came up with by modifying Tutorial in mlx-swift-examples by just adding an array initialization with MLXRandom.normal. Setting breakpoint after and doing a po on it will result in LLDB hanging.

// Copyright © 2024 Apple Inc.

import Foundation
import MLX
import MLXRandom

/// mlx-swift tutorial based on:
/// https://github.com/ml-explore/mlx/blob/main/examples/cpp/tutorial.cpp
@main
struct Tutorial {

    static func scalarBasics() {
        // create a scalar array
        let x = MLXArray(1.0)

        // the datatype is .float32
        let dtype = x.dtype
        assert(dtype == .float32)

        // get the value
        let s = x.item(Float.self)
        assert(s == 1.0)

        // reading the value with a different type is a fatal error
        // let i = x.item(Int.self)

        let a = MLXRandom.normal([1, 5, 10])

        // scalars have a size of 1
        let size = x.size
        assert(size == 1)

        // scalars have 0 dimensions
        let ndim = x.ndim
        assert(ndim == 0)

        // scalar shapes are empty arrays
        let shape = x.shape
        assert(shape == [])
    }

    static func arrayBasics() {
        // make a multidimensional array.
        //
        // Note: the argument is a [Double] array literal, which is not
        // a supported type, but we can explicitly convert it to [Float]
        // when we create the MLXArray.
        let x = MLXArray(converting: [1.0, 2.0, 3.0, 4.0], [2, 2])

        // mlx is row-major by default so the first row of this array
        // is [1.0, 2.0] and the second row is [3.0, 4.0]
        print(x[0])
        print(x[1])

        // make an array of shape [2, 2] filled with ones
        let y = MLXArray.ones([2, 2])

        // pointwise add x and y
        let z = x + y

        // mlx is lazy by default. At this point `z` only
        // has a shape and a type but no actual data
        assert(z.dtype == .float32)
        assert(z.shape == [2, 2])

        // To actually run the computation you must evaluate `z`.
        // Under the hood, mlx records operations in a graph.
        // The variable `z` is a node in the graph which points to its operation
        // and inputs. When `eval` is called on an array (or arrays), the array and
        // all of its dependencies are recursively evaluated to produce the result.
        // Once an array is evaluated, it has data and is detached from its inputs.

        // Note: this is being called for demonstration purposes -- all reads
        // ensure the array is evaluated.
        z.eval()

        // this implicitly evaluates z before converting to a description
        print(z)
    }

    static func automaticDifferentiation() {
        func fn(_ x: MLXArray) -> MLXArray {
            x.square()
        }

        let gradFn = grad(fn)

        let x = MLXArray(1.5)
        let dfdx = gradFn(x)
        print(dfdx)

        assert(dfdx.item() == Float(2 * 1.5))

        let df2dx2 = grad(grad(fn))(x)
        print(df2dx2)

        assert(df2dx2.item() == Float(2))
    }

    static func main() {
        scalarBasics()
        arrayBasics()
        automaticDifferentiation()
    }
}
davidkoski commented 2 months ago

Right.. so I guess I don't understand why it hangs with the implicit eval but not with the explicit eval. Shouldn't they both be running on a separate thread?

I think the explicit eval is done via the program (all threads running) while the implicit is with all threads stopped in the debugger. When you call po it will only start the target thread.

davidkoski commented 2 months ago

Here's a simple, reproducible example that I came up with by modifying Tutorial in mlx-swift-examples by just adding an array initialization with MLXRandom.normal. Setting breakpoint after and doing a po on it will result in LLDB hanging.

Yeah, I can repro just by putting a breakpoint after:

        // pointwise add x and y
        let z = x + y

and po z or po y. I am able to po x -- it has been evaluated already.

Looking at the help for expr and dwim-print (the command behind po), I think we want this option:

       -a <boolean> ( --all-threads <boolean> )
            Should we run all threads if the execution doesn't complete on one
            thread.

So:

dwim-print -O -a true -- x

works as expected, but:

dwim-print -O -a true -- y

sadly still hangs. This seems like the right direction (and the documentation on -a seems to describe the issue).

derekelewis commented 2 months ago

Interesting that doing a expr eval(y) also results in a hang. Not sure what to make of that.

davidkoski commented 2 months ago

They are doing the same thing (roughly) under the hood. po x is expr x.description with some added formatting. So expr eval(y) is going to explicitly call eval but it will hang in the same way that po y will hang