Open BenWibking opened 1 year ago
:+1: this would be a useful feature to have.
I can probably create a PR for this tomorrow.
My prototype code doesn't work on my Apple Silicon device due to an FPE trap in an Apple-provided library:
* thread #1, queue = 'com.Metal.DeviceDispatch', stop reason = EXC_BAD_INSTRUCTION (code=1, subcode=0x1e220800)
frame #0: 0x00000001ef7a7f78 AGXMetalG13X`AGX::SamplerStateEncoderGen4<AGX::G13::TextureFormatTable>::SamplerStateFields::SamplerStateFields(AGX::SamplerDescriptor const&) + 128
AGXMetalG13X`AGX::SamplerStateEncoderGen4<AGX::G13::TextureFormatTable>::SamplerStateFields::SamplerStateFields:
-> 0x1ef7a7f78 <+128>: fmul s0, s0, s2
0x1ef7a7f7c <+132>: mov w14, #0x44600000 ; =1147142144
0x1ef7a7f80 <+136>: fmov s1, w14
0x1ef7a7f84 <+140>: fmin s1, s0, s1
(lldb) bt
* thread #1, queue = 'com.Metal.DeviceDispatch', stop reason = EXC_BAD_INSTRUCTION (code=1, subcode=0x1e220800)
* frame #0: 0x00000001ef7a7f78 AGXMetalG13X`AGX::SamplerStateEncoderGen4<AGX::G13::TextureFormatTable>::SamplerStateFields::SamplerStateFields(AGX::SamplerDescriptor const&) + 128
frame #1: 0x00000001ef797cc4 AGXMetalG13X`-[AGXG13XFamilyDevice initWithAcceleratorPort:simultaneousInstances:] + 2516
frame #2: 0x00000001ef79bff8 AGXMetalG13X`-[AGXG13XDevice initWithAcceleratorPort:] + 52
frame #3: 0x000000019358b358 Metal`-[MTLIOAccelService initWithAcceleratorPort:] + 368
frame #4: 0x000000019358b1b8 Metal`+[MTLIOAccelService registerService:] + 128
frame #5: 0x00000001892cd910 libdispatch.dylib`_dispatch_client_callout + 20
frame #6: 0x00000001892dccc4 libdispatch.dylib`_dispatch_lane_barrier_sync_invoke_and_complete + 56
frame #7: 0x00000001936d5dd4 Metal`MTLRegisterDevices + 284
frame #8: 0x00000001935b4290 Metal`invocation function for block in MTLDeviceArrayInitialize() + 1300
frame #9: 0x00000001892cd910 libdispatch.dylib`_dispatch_client_callout + 20
frame #10: 0x00000001892cf14c libdispatch.dylib`_dispatch_once_callout + 32
frame #11: 0x000000019358af2c Metal`MTLCopyAllDevices + 244
frame #12: 0x0000000101e321c4 AppleMetalOpenGLRenderer`GLDDeviceRec::initWithDisplayMask(unsigned int) + 140
frame #13: 0x0000000101e37a50 AppleMetalOpenGLRenderer`gldCreateDevice + 72
frame #14: 0x00000001f11023b0 libGFXShared.dylib`gfxInitializeLibrary + 1900
frame #15: 0x00000001f14a1ff8 OpenCL`___lldb_unnamed_symbol1212 + 440
frame #16: 0x0000000189476dfc libsystem_pthread.dylib`__pthread_once_handler + 76
frame #17: 0x00000001894a6ea0 libsystem_platform.dylib`_os_once_callout + 32
frame #18: 0x0000000189476d94 libsystem_pthread.dylib`pthread_once + 100
frame #19: 0x00000001f14a1dbc OpenCL`___lldb_unnamed_symbol1209 + 116
frame #20: 0x00000001f146bdc4 OpenCL`clGetDeviceIDs + 216
frame #21: 0x0000000100d4aff0 libhwloc.15.dylib`hwloc_opencl_discover + 220
frame #22: 0x0000000100d2bc9c libhwloc.15.dylib`hwloc_discover_by_phase + 68
frame #23: 0x0000000100d2b728 libhwloc.15.dylib`hwloc_topology_load + 1592
frame #24: 0x000000010100f224 libopen-pal.40.dylib`opal_hwloc_base_get_topology + 4220
frame #25: 0x0000000100f58d38 libopen-rte.40.dylib`orte_ess_base_proc_binding + 3468
frame #26: 0x000000010093735c mca_ess_singleton.so`rte_init + 5036
frame #27: 0x0000000100f8a9d0 libopen-rte.40.dylib`orte_init + 676
frame #28: 0x0000000100ea0670 libmpi.40.dylib`ompi_mpi_init + 912
frame #29: 0x0000000100e1d720 libmpi.40.dylib`MPI_Init + 120
frame #30: 0x000000010031e290 athenaPK`parthenon::ParthenonManager::ParthenonInitEnv(this=0x000000016fdfe510, argc=3, argv=0x000000016fdfe808) at parthenon_manager.cpp:51:22 [opt]
frame #31: 0x00000001000040b8 athenaPK`main(argc=<unavailable>, argv=<unavailable>) at main.cpp:111:30 [opt]
frame #32: 0x0000000189101058 dyld`start + 2224
I assume this cannot be my fault...?
I think this is a hwloc bug, but this is the workaround: https://kirija.github.io/blog-post-1/.
This gets past the hwloc/OpenMPI bug, but then does not allow examining the program state:
cycle=0 time=0.0000000000000000e+00 dt=4.7253248290644695e-01 zone-cycles/wsec_step=0.00e+00 wsec_total=2.63e+00 wsec_step=2.74e+02
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node Bens-MacBook-Pro exited on signal 4 (Illegal instruction: 4).
--------------------------------------------------------------------------
Process 44268 exited with status = 132 (0x00000084)
It would be nice to have a runtime option to enable FPE traps.
There is no portable way to do this, but all of the common cases should be covered by something like this: https://github.com/AMReX-Codes/amrex/blob/77d4d1fe5ce68a1e71095093ce856e061f24fc07/Src/Base/AMReX.cpp#L543