nihalpasham / optimus

A plain vanilla transformer implementation in Rust using the Candle ML framework
MIT License
20 stars 4 forks source link

Metal Kernel Error on Macbook Pro #1

Open ChrisWhealy opened 6 months ago

ChrisWhealy commented 6 months ago

I'm following your YT videos on building a transformer (very good, BTW 😃).

I notice that you're running this code on a Mac and it works fine for you. I'm also running on a 16" 2019 MacBook Pro with an AMD Radeon Pro 5500M graphics card and 32Gb of RAM. However, cargo r works fine up until the stage of creating the positional embeddings. At this point the code crashes with

╰→  cargo r
    Finished dev [unoptimized + debuginfo] target(s) in 0.56s
     Running `target/debug/optimus`
tok:  ["welcome", "to", "the", "library", ".", "test", "this", "out"]
ids:  [5807, 11, 5, 1509, 7, 681, 48, 92]
vector embeddings: [[ 34.8852,  26.0188,   9.5195, ...,  33.7005, -17.4679,  60.7982],
 [ 18.6003, -25.1665,   7.2976, ...,  18.0612,  35.4795,   6.3207],
 [  2.9109,  -7.1283,  -4.8047, ...,  -1.5274,  -2.4472,  16.8868],
 ...
 [  1.7506, -33.8794,  31.1387, ...,  -9.2005,  -7.0853,  34.5116],
 [ 19.0213,  -1.9197,  -2.3470, ...,   7.2740,  15.5996, -26.9573],
 [  8.4892, -14.3628, -43.8464, ..., -17.3021,   9.7173,  42.1886]]
Tensor[[8, 512], f32, metal:4294969249]
Error: Metal(KernelError(FailedToCreatePipeline("AIR builtin function was called but no definition was found.")))

I'm using Rust version 1.77.2

By changing the device declaration to Device::Cpu, the code runs - albeit very slowly.

I've seen various comments about this error that say it is a bug in the AMD Radeon GPU installed on Macs, but since it runs on your Mac, I was wondering whether you either had to apply a workaround to fix this error, or maybe you have a completely different graphics card installed.

nihalpasham commented 6 months ago

Hi @ChrisWhealy,

Thanks for checking out the repo!

I'm using a Mac with an Apple Silicon SoC (Apple's own chip). The error you mentioned is a Metal Kernel Error. Apple provides a low-level, explicit GPU API called Metal and a language (essentially C++ with proprietary extensions) for writing GPU programs (shaders).

These shaders are compiled via the Metal compiler, which involves an intermediate step: programs are converted to AIR form.

So, this seems like a bug with AMD graphics cards. I believe the only solution is to not use the Metal Back-end with this AMD graphics card. However, I have not verified this.

If the above is true, then we are limited to using the CPU back-end as candle currently supports 3 back-ends.

ChrisWhealy commented 6 months ago

Ok, thanks for the update.

I'll have to stick with the much slower Device::Cpu option then

:-(