Closed bpkeene closed 10 months ago
SIMD async copies are a hidden feature that provide large performance benefits on Apple GPU. They're the advantage MPS has over open-source GEMM libraries. Until MFA, which closed the gap.
The entire design of this library is so I can pre-compile with Xcode 14.2, so you don't have to go to the trouble of getting Xcode 14. Binaries are hosted on GitHub releases, but you can also follow instructions on the README to compile it yourself.
Closing this as a duplicate issue.
metal_config in the toolchain doesn't mention HAVE_SIMDGROUP_FUTURE, seems the headers referred to here: https://github.com/dougallj/applegpu/issues/28
were removed altogether from newer versions of Xcode (using 15.0.1 / 15A507)
I was able to find matching strings from the above github issue inside of libapplegpu-nt.dylib with the 15.0.1 toolchain:
objdump --disassemble --demangle /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/metal/macos/lib/libapplegpu-nt.dylib > /tmp/symbols.txt
127b704: 08 91 16 91 add x8, x8, #1444 ; literal pool for: "air.simdgroup_async_copy_1d" 127b708: 60 f6 04 f9 str x0, [x19, #2536] 127b70c: 68 e2 04 f9 str x8, [x19, #2496] 127b710: 28 08 00 b0 adrp x8, 261 ; 0x1380000 127b714: 08 01 17 91 add x8, x8, #1472 ; literal pool for: "air.simdgroup_async_copy_2d"
I didn't pursue this further though to see if things can still be patched up or if the functions are still usable & correct.
Is there interest / would there be positive reception to a PR using alternative read & write mechanisms in lieu of simdgroup_async?
Thanks!