Open philipturner opened 2 years ago
Hi @philipturner
I wasn't sure if it was ok to run the above in MacOS 12.6 or not, so I ran it. If it needs MacOS 13, I'll probably give it a week after launch...and then as long as Macs aren't still melting down around the world as of that moment, I'll upgrade.
So here's what my MacOS 12.6 did with the code above:
MoltenCL/Foo on main [?]
➜ xcrun metal -x cl -c vecAdd.cl -DUSE_DOUBLE_PRECISION=0
MoltenCL/Foo on main [?] took 10s
➜ xcrun metallib vecAdd.air -o vecAdd.metallib
MoltenCL/Foo on main [?] took 2s
➜ xcrun metal -x cl -c vecAdd.cl -DUSE_DOUBLE_PRECISION=1
vecAdd.cl:2:9: error: use of type 'double' requires cl_khr_fp64 extension to be enabled
typedef double FLOAT_TYPE;
^
vecAdd.cl:7:33: error: use of type 'FLOAT_TYPE' (aka 'double') requires cl_khr_fp64 extension to be enabled
__kernel void vecAdd( __global FLOAT_TYPE *a,
^
vecAdd.cl:8:33: error: use of type 'FLOAT_TYPE' (aka 'double') requires cl_khr_fp64 extension to be enabled
__global FLOAT_TYPE *b,
^
vecAdd.cl:9:33: error: use of type 'FLOAT_TYPE' (aka 'double') requires cl_khr_fp64 extension to be enabled
__global FLOAT_TYPE *c,
^
4 errors generated.
MoltenCL/Foo on main [?]
➜ xcrun metallib vecAdd.air -o vecAdd.metallib
LLVM ERROR: Error opening 'vecAdd.air': No such file or directory!
MoltenCL/Foo on main [?]
➜
Being on macOS 12.6 isn't an issue here; mostly for in the future with GPU virtual addresses. No pressure to upgrade the OS.
It looks like we can't compile FP64 directly to AIR from Metal command line tools. However, there's still a chance we could modify an AIR file to manually utilize FP64. I'll get back to you once I've written the OpenCL SPIR-V -> AIR transpiler, then we can test whether the AIR -> AMDGPU backend supports FP64.
Complete n00b here writing...
A bit of googling on the error message came up with the following:
To do so in the kernel code, one would normally add a line
"#pragma OPENCL EXTENSION cl_khr_fp64 : enable"
If I put that at the top of the code, it throws no error when I enable double precision on the command line.
Again, complete n00b here. Why no explicit call to enable the cl_khr_fp64
extension?
UPDATE
I also found this which explicitly looks for the Khronos cl_khr_fp64
extension...and falls back to an AMD FP64 extension:
#ifdef cl_khr_fp64
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#elif defined(cl_amd_fp64)
#pragma OPENCL EXTENSION cl_amd_fp64 : enable
#else
#error "Double precision floating point not supported by OpenCL implementation."
#endif
Again, no error thrown if I enable double precision on the command line like you had me do.
No way! I'll have a mechanism soon to test that .metallib
in the Metal runtime. If this works, we can access native AMD double precision from MoltenCL.
The tests are ready. You need to overwrite the vecAdd.cl
file with the code below, and create two new files. Then, run the following commands and report back the results.
xcrun metal -c vecAdd2.metal
xcrun metallib vecAdd2.air -o vecAdd2.metallib
swift Test.swift vecAdd2.metallib
# should show 3 and 0
xcrun metal -x cl -c vecAdd.cl -DUSE_DOUBLE_PRECISION=0
xcrun metallib vecAdd.air -o vecAdd.metallib
swift Test.swift vecAdd.metallib
# should show 3 and 0
xcrun metal -x cl -c vecAdd.cl -DUSE_DOUBLE_PRECISION=1
xcrun metallib vecAdd.air -o vecAdd.metallib
swift Test.swift vecAdd.metallib
# fails on an Apple M1 Max, meaning FP64 not supported
# hopefully it works on AMD; if not, double-check that
# the shown device is an AMD GPU
After running some calculations on Apple silicon FP64 emulation, it looks like 32-48x slower than native FP32 for multiplication (27x when only counting mantissa). If AMD has similar throughput for integer multiply instructions, that won't be bad compared to native 16x FP64. Considering that 1/50 of ops are FP64 in OpenMM mixed precision, and FP64 emulation is 40x slower than FP32:
MoltenCL/Foo on main [?] ➜ xcrun metal -c vecAdd2.metal ➜ xcrun metallib vecAdd2.air -o vecAdd2.metallib ➜ swift Test.swift vecAdd2.metallib Device: AMD Radeon Pro 560 Testing Float Result: 3.0 Testing Double Result: 0.0 `# should show 3 and 0` ➜ xcrun metal -x cl -c vecAdd.cl -DUSE_DOUBLE_PRECISION=0 ➜ xcrun metallib vecAdd.air -o vecAdd.metallib ➜ swift Test.swift vecAdd.metallib Device: AMD Radeon Pro 560 Testing Float Result: 3.0 Testing Double Result: 0.0 `# should show 3 and 0` ➜ xcrun metal -x cl -c vecAdd.cl -DUSE_DOUBLE_PRECISION=1 ➜ xcrun metallib vecAdd.air -o vecAdd.metallib ➜ swift Test.swift vecAdd.metallib Device: AMD Radeon Pro 560 Testing Float Result: 0.0 Testing Double Result: 2.0000000000654836 `# fails on an Apple M1 Max, meaning FP64 not supported` `# hopefully it works on AMD; if not, double-check that` `# the shown device is an AMD GPU`
That's awesome! We can access FP64 through the AIR -> AMDGPU backend. However, we seem to have incorrect results - the result should be 3
not 2.000000...65
. Here's some preliminary investigation:
> swift repl
1> let mydouble: Double = 2.0000000000654836
mydouble: Double = 2.0000000000654836
2> print(UnsafeRawPointer(bitPattern: Int(mydouble.bitPattern)))
Optional(0x4000000000024000)
3> print(UnsafeRawPointer(bitPattern: Int(Double(2).bitPattern)))
Optional(0x4000000000000000)
4> print(UnsafeRawPointer(bitPattern: Int(Double(1).bitPattern)))
Optional(0x3ff0000000000000)
5> print(UnsafeRawPointer(bitPattern: Int(Double(3).bitPattern)))
Optional(0x4008000000000000)
Try this:
xcrun metal -x cl -c vecAdd.cl -DUSE_DOUBLE_PRECISION=0
xcrun metallib vecAdd.air -o vecAdd.metallib
swift Test.swift vecAdd.metallib
xcrun metal -x cl -c vecAdd.cl -DUSE_DOUBLE_PRECISION=1
xcrun metallib vecAdd.air -o vecAdd.metallib
swift Test.swift vecAdd.metallib
xcrun metal -x cl -c vecAdd.cl -DUSE_DOUBLE_PRECISION=1
xcrun metallib vecAdd.air -o vecAdd.metallib
swift Test.swift vecAdd.metallib
xcrun metal -x cl -c vecAdd.cl -DUSE_DOUBLE_PRECISION=1
xcrun metallib vecAdd.air -o vecAdd.metallib
swift Test.swift vecAdd.metallib
P.S. Stage Manager in macOS Ventura is extremely nice! Also, Ventura changed how you acquire the default Metal device in Swift scripts. You can's use MTLCreateSystemDefaultDevice()
anymore, instead you must use MTLCopyAllDevices()
.
The script should still work on Monterey.
Sorry had work stuff that took priority
I've also updated to MacOS Ventura
Will tackle the above shortly
print(UnsafeRawPointer(bitPattern: Int(Double(3).bitPattern)))
On MacOS 13 Ventura with MacBook Pro with AMD:
MoltenCL/Foo on main [⇣?]
➜ swift repl
let mydouble: Double = 2.0000000000654836
Welcome to Apple Swift version 5.7 (swiftlang-5.7.0.127.4 clang-1400.0.29.50).
Type :help for assistance.
1> let mydouble: Double = 2.0000000000654836
mydouble: Double = 2.0000000000654836
2> print(UnsafeRawPointer(bitPattern: Int(mydouble.bitPattern)))
Optional(0x4000000000024000)
3> print(UnsafeRawPointer(bitPattern: Int(Double(2).bitPattern)))
Optional(0x4000000000000000)
4> print(UnsafeRawPointer(bitPattern: Int(Double(1).bitPattern)))
Optional(0x3ff0000000000000)
5> print(UnsafeRawPointer(bitPattern: Int(Double(3).bitPattern)))
Optional(0x4008000000000000)
6>
Hi @philipturner something's not working with the code you gave me in https://github.com/philipturner/MoltenCL/issues/1#issuecomment-1291902685
MoltenCL/Foo on main [⇣?]
➜ xcrun metal -x cl -c vecAdd.cl -DUSE_DOUBLE_PRECISION=0
MoltenCL/Foo on main [⇣?]
➜ xcrun metallib vecAdd.air -o vecAdd.metallib
MoltenCL/Foo on main [⇣?]
➜ swift Test.swift vecAdd.metallib
Test.swift:8:67: error: consecutive statements on a line must be separated by ';'
let device = MTLCopyAllDevices().first(where: { !$0.isLowPower })!e
^
;
Test.swift:8:67: error: cannot find 'e' in scope
let device = MTLCopyAllDevices().first(where: { !$0.isLowPower })!e
^
Replace the line with:
let device = MTLCopyAllDevices().first(where: { !$0.isLowPower })!
macOS Ventura disabled fetching the GPU through MTLCreateSystemDefaultDevice()
in command-line scripts. You have an extraneous e
in the line, so remove that.
My bad - I had a typo with that e
.
MoltenCL/Foo on main [⇣?]
➜ xcrun metal -x cl -c vecAdd.cl -DUSE_DOUBLE_PRECISION=0
MoltenCL/Foo on main [⇣?]
➜ xcrun metallib vecAdd.air -o vecAdd.metallib
MoltenCL/Foo on main [⇣?]
➜ swift Test.swift vecAdd.metallib
Device: AMD Radeon Pro 560
Testing Float
Result: [3.0, 5.0, 7.0, 9.0, 11.0]
Testing Double
Result: [384.0, 640.0, 0.0, 0.0, 0.0]
MoltenCL/Foo on main [⇣?]
➜ xcrun metal -x cl -c vecAdd.cl -DUSE_DOUBLE_PRECISION=1
MoltenCL/Foo on main [⇣?]
➜ xcrun metallib vecAdd.air -o vecAdd.metallib
MoltenCL/Foo on main [⇣?]
➜ swift Test.swift vecAdd.metallib
Device: AMD Radeon Pro 560
Testing Float
Result: [2.0663e-40, 3.0, 3.0, 4.0, 5.0]
Testing Double
Result: [2.0000000000654836, 2.0, 3.0, 5.0, 5.0]
MoltenCL/Foo on main [⇣?]
➜ xcrun metal -x cl -c vecAdd.cl -DUSE_DOUBLE_PRECISION=1
MoltenCL/Foo on main [⇣?]
➜ xcrun metallib vecAdd.air -o vecAdd.metallib
MoltenCL/Foo on main [⇣?]
➜ swift Test.swift vecAdd.metallib
Device: AMD Radeon Pro 560
Testing Float
Result: [2.0663e-40, 3.0, 3.0, 4.0, 5.0]
Testing Double
Result: [2.0000000000654836, 2.0, 3.0, 5.0, 5.0]
MoltenCL/Foo on main [⇣?]
➜ xcrun metal -x cl -c vecAdd.cl -DUSE_DOUBLE_PRECISION=1
MoltenCL/Foo on main [⇣?]
➜ xcrun metallib vecAdd.air -o vecAdd.metallib
MoltenCL/Foo on main [⇣?]
➜ swift Test.swift vecAdd.metallib
Device: AMD Radeon Pro 560
Testing Float
Result: [2.0663e-40, 3.0, 3.0, 4.0, 5.0]
Testing Double
Result: [2.0000000000654836, 2.0, 3.0, 5.0, 5.0]
The results are deterministic, but will be difficult to investigate. It seems to just copy one operand instead of adding anything, sometimes mutating its value. I think Apple's AIR -> AMDGPU compiler was never programmed to harness FP64 on AMD GPUs. These are backend bugs with assembly language, which I don't think I have the resources to fix.
MoltenCL's emulation might not be all that bad (2x slower), but you might be better off using Apple's OpenCL 1.2 driver. It has decent performance on AMD, permitting native FP64 and properly enqueueing commands in a cl_queue
. Although it doesn't permit subgroup reductions, AMD's threadgroup memory is very fast. The bigger problem is Apple GPUs, where none of the above statements apply.
Perhaps you could install Bootcamp with a non-licensed copy of Windows 10. Then run clinfo
and investigate the Windows OpenCL driver. It might be version 2.0 or above, and support modern features like subgroup reductions. However, testing this is time-consuming.
Perhaps you could install Bootcamp with a non-licensed copy of Windows 10. Then run
clinfo
and investigate the Windows OpenCL driver. It might be version 2.0 or above, and support modern features like subgroup reductions. However, testing this is time-consuming.
I did that at the height of the pandemic; however, I'd rather not dual-boot. I paid all this money for a MacBook Pro...I'd like to use MacOS -- and when it's idle, have it try to find the next COVID-19 cure...
So where does that leave us? Are we not going to be able to get OpenMM / Folding@Home to work on MacOS with AMD GPU?
So where does that leave us? Are we not going to be able to get OpenMM / Folding@Home to work on MacOS with AMD GPU?
It leaves us with two options.
(1) Use Apple's current OpenCL driver for AMD, and MoltenCL for M1.
(2) Use MoltenCL for AMD, and suffer the performance drop for FP64.
(3) A hybrid approach that switches OpenCL drivers based on what's fastest.
Either way, we should be able to run Folding@Home on AMD. In fact, method (1) should already be possible for your computer. Even then, there's still good reason to make MoltenCL compatible with AMD. MoltenCL will back the hipSYCL Metal backend, making it possible to optimize other code bases like GROMACS for Intel Macs.
Which way do the OpenMM people want to go?
Which way do the OpenMM people want to go?
I have no idea at the moment, but the easiest way might be supporting AMD GPUs with the current OpenCL driver. MoltenCL interacts with a lot of low-level assembly compilation, creating several opportunities for troublesome bugs. It might save time to minimize how many platforms they deploy MoltenCL on. The only big reason to use MoltenCL on AMD would be subgroup shuffles/reductions, which OpenMM might not use extensively.
Would you mind running this inside the Swift REPL? I'm trying to support OpenCL profiling through Metal, a mandatory feature with OpenCL 2.0 and 3.0. x86 and Apple silicon devices will require two slightly different profiling methods.
import Metal
MTLCopyAllDevices().forEach { device in
print(device.supportsCounterSampling(.atDispatchBoundary))
print(device.supportsCounterSampling(.atBlitBoundary))
print(device.supportsCounterSampling(.atDrawBoundary))
print(device.supportsCounterSampling(.atStageBoundary))
print(device.supportsCounterSampling(.atTileDispatchBoundary))
}
Which way do the OpenMM people want to go?
I recently had a talk with Mr. Chodera and Mr. Eastman. Although I do not speak for their interests or opinions, we might put most effort toward improving performance on Apple GPUs. That means Mac AMD GPUs could remain on the Apple OpenCL driver. Meanwhile, Apple GPUs use something new. Creating a custom Metal backend is now a possibility again. As for FP64 emulation, we either create a better summation algorithm using FP32, or use metal-float64 as a Metal library.
If OpenMM does use Metal directly, there's little motivation for me to finish MoltenCL. The other work I'm planning should go straight into hipSYCL. In that case, your investigation still helped me learn a lot of new things. Thanks for helping me out!
@philipturner
MoltenCL/Foo on main [⇣?]
➜ swift repl
Welcome to Apple Swift version 5.7.1 (swiftlang-5.7.1.135.3 clang-1400.0.29.51).
Type :help for assistance.
1> import Metal
2. MTLCopyAllDevices().forEach { device in
3. print(device.supportsCounterSampling(.atDispatchBoundary))
4. print(device.supportsCounterSampling(.atBlitBoundary))
5. print(device.supportsCounterSampling(.atDrawBoundary))
6. print(device.supportsCounterSampling(.atStageBoundary))
7. print(device.supportsCounterSampling(.atTileDispatchBoundary))
8. }
true
true
true
false
false
true
true
true
false
false
9>
I just got some exciting news regarding FP64 emulation performance. The overhead of function calls will not be the bottleneck, at least with 4-wide vectorization. I tested it on the Apple architecture, but would you be open to helping me test on the AMD architecture? This might decide whether a new Metal-based backend can be used for AMD, instead of OpenCL.
Random benchmark with two int32 adds per operation (not FP64 emulation).
// - Theoretical maximum speed: 10.4 TFLOPS
// - Fastest speed without a function call: 3.53 tera-ops x 2 adds (1:1.47)
// - Fastest speed with function call, 1-wide scalar: 183 giga-ops (1:56.8)
// - Fastest speed with function call, 2-wide vector: 360 giga-ops (1:28.9)
// - Fastest speed with function call, 4-wide vector: 701 giga-ops (1:14.8)
// - Given proper vectorization, function call overhead will not be the primary
// bottleneck.
This would also help Intel Mac users running other computational chemistry libraries (like INQ), which use double precision for all calculations and don't use OpenCL. Metal would avoid needing to boot Linux for ROCm.
I just got some exciting news regarding FP64 emulation performance. The overhead of function calls will not be the bottleneck, at least with 4-wide vectorization. I tested it on the Apple architecture, but would you be open to helping me test on the AMD architecture? This might decide whether a new Metal-based backend can be used for AMD, instead of OpenCL.
Random benchmark with two int32 adds per operation (not FP64 emulation). // - Theoretical maximum speed: 10.4 TFLOPS // - Fastest speed without a function call: 3.53 tera-ops x 2 adds (1:1.47) // - Fastest speed with function call, 1-wide scalar: 183 giga-ops (1:56.8) // - Fastest speed with function call, 2-wide vector: 360 giga-ops (1:28.9) // - Fastest speed with function call, 4-wide vector: 701 giga-ops (1:14.8) // - Given proper vectorization, function call overhead will not be the primary // bottleneck.
This would also help Intel Mac users running other computational chemistry libraries (like INQ), which use double precision for all calculations and don't use OpenCL. Metal would avoid needing to boot Linux for ROCm.
Here to help @philipturner ! Please make sure to @theschles so my Github notification icon lights up?
I'll remember to do that. I'm be off for winter break very soon, and I hope to finally complete metal-float64. I'll let you know when it's time to test it out. Once the project's complete, I can get to work on the OpenMM Metal backend.
@theschles this isn't entirely related to OpenMM, but I've been trying to figure out something strange about Unreal Engine 5 and Apple. Apple supposedly made a certain hardware instruction on the M2 GPU, just to run Nanite. I'm wondering whether Apple also exposed this instruction on AMD GPUs, through the Metal atomic_max_explicit
. clinfo
says your GPU's hardware supports them (cl_khr_int64_extended_atomics
).
Are you open to checking out this directory and testing the script there? The README should give instructions. The boolean you have to flip is here, and I'd like to know the behavior with both true
and false
. On an unrelated note, I've recently emulated in-place 64-bit atomics on the M1 GPU, so the Nanite workaround might be unnecessary. I'll have you test it on AMD once metal-float64
is complete and its benchmarks are fully automated.
Hi @philipturner just saw your message (crazy at work). I’ll try that out sometime in the next few days…
Also, would you mind running the following in the Swift REPL? Copy what it prints into a comment. Repeat that ~5 times, and say whether you get devices in a different order.
import Metal
print(MTLCopyAllDevices().map { $0.name })
Also, would you mind running the following in the Swift REPL? Copy what it prints into a comment. Repeat that ~5 times, and say whether you get devices in a different order.
import Metal print(MTLCopyAllDevices().map { $0.name })
Hi @philipturner
Same every time:
```swift MoltenCL on main [⇣?] ➜ swift repl Welcome to Apple Swift version 5.7.2 (swiftlang-5.7.2.135.5 clang-1400.0.29.51). Type :help for assistance. 1> import Metal 2> print(MTLCopyAllDevices().map { $0.name }) ["AMD Radeon Pro 560", "Intel(R) HD Graphics 630"] 3> print(MTLCopyAllDevices().map { $0.name }) ["AMD Radeon Pro 560", "Intel(R) HD Graphics 630"] 4> print(MTLCopyAllDevices().map { $0.name }) ["AMD Radeon Pro 560", "Intel(R) HD Graphics 630"] 5> print(MTLCopyAllDevices().map { $0.name }) ["AMD Radeon Pro 560", "Intel(R) HD Graphics 630"] 6> print(MTLCopyAllDevices().map { $0.name }) ["AMD Radeon Pro 560", "Intel(R) HD Graphics 630"] 7> print(MTLCopyAllDevices().map { $0.name }) ["AMD Radeon Pro 560", "Intel(R) HD Graphics 630"] 8> print(MTLCopyAllDevices().map { $0.name }) ["AMD Radeon Pro 560", "Intel(R) HD Graphics 630"] 9> ```
I only ask because I'm designing metal-float64 and the OpenMM Metal backend to support AMD GPUs. Even if they end up using OpenCL, it's not too hard to add a little extra logic in case.
@theschles this isn't entirely related to OpenMM, but I've been trying to figure out something strange about Unreal Engine 5 and Apple. Apple supposedly made a certain hardware instruction on the M2 GPU, just to run Nanite. I'm wondering whether Apple also exposed this instruction on AMD GPUs, through the Metal
atomic_max_explicit
.clinfo
says your GPU's hardware supports them (cl_khr_int64_extended_atomics
).Are you open to checking out this directory and testing the script there? The README should give instructions. The boolean you have to flip is here, and I'd like to know the behavior with both
true
andfalse
. On an unrelated note, I've recently emulated in-place 64-bit atomics on the M1 GPU, so the Nanite workaround might be unnecessary. I'll have you test it on AMD oncemetal-float64
is complete and its benchmarks are fully automated.
Hi @philipturner
emulating64BitAtomics = false
:
2023-01-11 08:08:41.854304-0800 foo[79021:1885378] Metal GPU Frame Capture Enabled
2023-01-11 08:08:41.856261-0800 foo[79021:1885378] Metal API Validation Enabled
validateNewTexture:79: failed assertion `BytesPerRow of a buffer-backed texture with pixelFormat(MTLPixelFormatRG32Uint) must be aligned to 512 bytes, found bytesPerRow(16)'
(lldb)
===
emulating64BitAtomics = true
:
2023-01-11 08:11:10.875466-0800 foo[79181:1890911] Metal GPU Frame Capture Enabled
2023-01-11 08:11:10.876907-0800 foo[79181:1890911] Metal API Validation Enabled
2023-01-11 08:11:11.773086-0800 foo[79181:1890911] Compiler failed with XPC_ERROR_CONNECTION_INTERRUPTED
2023-01-11 08:11:11.773787-0800 foo[79181:1890911] MTLCompiler: Compilation failed with XPC_ERROR_CONNECTION_INTERRUPTED on 1 try
2023-01-11 08:11:21.864531-0800 foo[79181:1890911] Compiler failed with XPC_ERROR_CONNECTION_INTERRUPTED
2023-01-11 08:11:21.864654-0800 foo[79181:1890911] MTLCompiler: Compilation failed with XPC_ERROR_CONNECTION_INTERRUPTED on 2 try
2023-01-11 08:11:31.930745-0800 foo[79181:1890911] Compiler failed with XPC_ERROR_CONNECTION_INTERRUPTED
2023-01-11 08:11:31.930867-0800 foo[79181:1890911] MTLCompiler: Compilation failed with XPC_ERROR_CONNECTION_INTERRUPTED on 3 try
foo/main.swift:40: Fatal error: 'try!' expression unexpectedly raised an error: Error Domain=CompilerError Code=2 "Compiler encountered an internal error" UserInfo={NSLocalizedDescription=Compiler encountered an internal error}
2023-01-11 08:11:31.940098-0800 foo[79181:1890911] foo/main.swift:40: Fatal error: 'try!' expression unexpectedly raised an error: Error Domain=CompilerError Code=2 "Compiler encountered an internal error" UserInfo={NSLocalizedDescription=Compiler encountered an internal error}
(lldb)
That's all I needed. Thanks!
@theschles would you mind running the Nanite atomics test again? I added new patches.
@theschles would you mind running the Nanite atomics test again? I added new patches.
Roger that. Apologies, have been out sick for the last week.
Also have another bit of fun just before illness struck: the screen-backlight of my Intel-based 2017 MBP with the AMD Radeon died. I'm still running it, although now it's connected to an external monitor. It's basically now acting as a Mac Mini as the cost to repair doesn't make sense. I thus still run AMD tests on MacOS.
Meanwhile so I can have portability, I've purchased a refurb 2021 MBP M1 Max 14" with 32GB RAM. I'm ready to try out GPU processing on it :)
Now that I've found the issues, I don't think we need to test Bootcamp anymore. If you get the 2021 MBP, the ProMotion will be a game changer. The old M1 Max is fine; you don't need M2 Max to experience it. Did you get the 24-core or 32-core version?
@theschles I'd like to archive this repository. Would you mind moving relevant discussion to LinkedIn DMs or OpenMM threads? I got someone else with an AMD GPU to perform the relevant testing for Nanite.
On an OpenMM thread, I talked with some people about how AMD GPUs support
cl_khr_fp64
with Apple's driver. I did not know whether the driver passes OpenCL C -> AIR -> AMDGPU; if so, that's good news for MoltenCL. I don't have an AMD-powered Mac, but someone with such a machine could test the theory. Use the following code for this exercise:In a new directory, create a file called
vecAdd.cl
and paste the source code into it. Run the following commands. If the last two proceed without error, zip bothvecAdd.air
andvecAdd.metallib
, and attach into a GitHub comment. Then I can investigate it further.cc: @theschles