Open derekelewis opened 5 months ago
It might be -- po
might run the code by activating a single thread (all threads are stopped in the debugger). Evaluating a lazy MLXArray would require submitting tasks and running the graph on another thread. Very likely that other thread is kept paused and this would be the result.
@davidkoski is there a way to have the array actually get evaluated when you call po
in the debugger? In Python, if you inspect an array in pdb the __repr__
method is called. We just eval and print the array in the method. I assume there is a similar idiom in Swift?
That is exactly what po
does -- it calls the description
method on MLXArray which calls mlx_tostring()
in the C layer and that calls:
std::ostringstream os;
os << ctx; // ctx is mlx::core::array
std::string str = os.str();
return new mlx_string_(str);
Right.. so I guess I don't understand why it hangs with the implicit eval but not with the explicit eval. Shouldn't they both be running on a separate thread?
The odd thing is that I can pretty much po
anything in a model running LLMEval, but something very basic like a simple Swift program that initializes an array with MLXRandom.normal
and do a po
on that, then there is a hang w/ LLDB. Here's my Whisper implementation so far - please ignore the very basic WhisperEval app. Also, transcribe, decode, and tokenizer haven't been implemented, yet. I am just making sure the logits match up in each module against the mlx-examples reference implementation.
https://github.com/derekelewis/mlx-swift-examples/tree/whisper-example/Libraries/Whisper https://github.com/derekelewis/mlx-swift-examples/tree/whisper-example/Applications/WhisperEval
Here's a simple, reproducible example that I came up with by modifying Tutorial in mlx-swift-examples by just adding an array initialization with MLXRandom.normal
. Setting breakpoint after and doing a po
on it will result in LLDB hanging.
// Copyright © 2024 Apple Inc.
import Foundation
import MLX
import MLXRandom
/// mlx-swift tutorial based on:
/// https://github.com/ml-explore/mlx/blob/main/examples/cpp/tutorial.cpp
@main
struct Tutorial {
static func scalarBasics() {
// create a scalar array
let x = MLXArray(1.0)
// the datatype is .float32
let dtype = x.dtype
assert(dtype == .float32)
// get the value
let s = x.item(Float.self)
assert(s == 1.0)
// reading the value with a different type is a fatal error
// let i = x.item(Int.self)
let a = MLXRandom.normal([1, 5, 10])
// scalars have a size of 1
let size = x.size
assert(size == 1)
// scalars have 0 dimensions
let ndim = x.ndim
assert(ndim == 0)
// scalar shapes are empty arrays
let shape = x.shape
assert(shape == [])
}
static func arrayBasics() {
// make a multidimensional array.
//
// Note: the argument is a [Double] array literal, which is not
// a supported type, but we can explicitly convert it to [Float]
// when we create the MLXArray.
let x = MLXArray(converting: [1.0, 2.0, 3.0, 4.0], [2, 2])
// mlx is row-major by default so the first row of this array
// is [1.0, 2.0] and the second row is [3.0, 4.0]
print(x[0])
print(x[1])
// make an array of shape [2, 2] filled with ones
let y = MLXArray.ones([2, 2])
// pointwise add x and y
let z = x + y
// mlx is lazy by default. At this point `z` only
// has a shape and a type but no actual data
assert(z.dtype == .float32)
assert(z.shape == [2, 2])
// To actually run the computation you must evaluate `z`.
// Under the hood, mlx records operations in a graph.
// The variable `z` is a node in the graph which points to its operation
// and inputs. When `eval` is called on an array (or arrays), the array and
// all of its dependencies are recursively evaluated to produce the result.
// Once an array is evaluated, it has data and is detached from its inputs.
// Note: this is being called for demonstration purposes -- all reads
// ensure the array is evaluated.
z.eval()
// this implicitly evaluates z before converting to a description
print(z)
}
static func automaticDifferentiation() {
func fn(_ x: MLXArray) -> MLXArray {
x.square()
}
let gradFn = grad(fn)
let x = MLXArray(1.5)
let dfdx = gradFn(x)
print(dfdx)
assert(dfdx.item() == Float(2 * 1.5))
let df2dx2 = grad(grad(fn))(x)
print(df2dx2)
assert(df2dx2.item() == Float(2))
}
static func main() {
scalarBasics()
arrayBasics()
automaticDifferentiation()
}
}
Right.. so I guess I don't understand why it hangs with the implicit eval but not with the explicit eval. Shouldn't they both be running on a separate thread?
I think the explicit eval is done via the program (all threads running) while the implicit is with all threads stopped in the debugger. When you call po
it will only start the target thread.
Here's a simple, reproducible example that I came up with by modifying Tutorial in mlx-swift-examples by just adding an array initialization with
MLXRandom.normal
. Setting breakpoint after and doing apo
on it will result in LLDB hanging.
Yeah, I can repro just by putting a breakpoint after:
// pointwise add x and y
let z = x + y
and po z
or po y
. I am able to po x
-- it has been evaluated already.
Looking at the help for expr
and dwim-print
(the command behind po), I think we want this option:
-a <boolean> ( --all-threads <boolean> )
Should we run all threads if the execution doesn't complete on one
thread.
So:
dwim-print -O -a true -- x
works as expected, but:
dwim-print -O -a true -- y
sadly still hangs. This seems like the right direction (and the documentation on -a
seems to describe the issue).
Interesting that doing a expr eval(y)
also results in a hang. Not sure what to make of that.
They are doing the same thing (roughly) under the hood. po x
is expr x.description
with some added formatting. So expr eval(y)
is going to explicitly call eval but it will hang in the same way that po y
will hang
I am using LLDB to debug some model code that I have written in mlx-swift. Unless I do an explicit eval() or print() before doing a po on the object in question, LLDB will hang. Is this behavior expected with lazy evaluation?