Closed delijati closed 2 years ago
ROCm said they cannot support APU now. Somebody said there are issues when using APU and GPU at the same time. You can have a try.
I tried i even build tensorflow-upstream
but is still get:
❯ ../../env/bin/python 02-Clustering.py
GENERATING EMBEDDING FOR: ATL_X
/home/foo/.cache/yay/hip-rocclr/src/HIP-rocm-4.3.1/rocclr/hip_code_object.cpp:486: "hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"
[1] 24231 abort (core dumped) ../../env/bin/python 02-Clustering.py
../../env/bin/python 02-Clustering.py 3,06s user 4,08s system 141% cpu 5,056 total
we can use AMD_LOG_LEVEL=6 to print out more logs.
$ AMD_LOG_LEVEL=6 ../../env/bin/python 02-Clustering.py
GENERATING EMBEDDING FOR: ATL_X
:3:rocdevice.cpp :430 : 1885913346 us: Initializing HSA stack.
:3:comgrctx.cpp :33 : 1885933593 us: Loading COMGR library.
:3:rocdevice.cpp :196 : 1885936584 us: Numa selects cpu agent[0]=0x5568b74df830(fine=0x5568bb072be0,coarse=0x5568bad2bcf0, kern_arg=0x5568bb6f3f90) for gpu agent=0x7fa4db72ab34
:3:rocdevice.cpp :1562: 1885937163 us: HMM support: 0, xnack: 0
:4:rocdevice.cpp :1858: 1885937272 us: Allocate hsa host memory 0x7fa4e0002000, size 0x28
:4:rocdevice.cpp :1858: 1885937696 us: Allocate hsa host memory 0x7fa460600000, size 0x101000
:4:rocdevice.cpp :1858: 1885937997 us: Allocate hsa host memory 0x7fa460400000, size 0x101000
:4:runtime.cpp :82 : 1885938102 us: init
:1:hip_code_object.cpp :456 : 1885938529 us: hipErrorNoBinaryForGpu: Unable to find code object for all current devices!
:1:hip_code_object.cpp :458 : 1885938540 us: Devices:
:1:hip_code_object.cpp :460 : 1885938542 us: amdgcn-amd-amdhsa--gfx902:xnack- - [Not Found]
:1:hip_code_object.cpp :465 : 1885938543 us: Bundled Code Objects:
:1:hip_code_object.cpp :482 : 1885938544 us: host-x86_64-unknown-linux - [Unsupported]
:1:hip_code_object.cpp :479 : 1885938546 us: hipv4-amdgcn-amd-amdhsa--gfx1030 - [code object v4 is amdgcn-amd-amdhsa--gfx1030]
:1:hip_code_object.cpp :479 : 1885938547 us: hipv4-amdgcn-amd-amdhsa--gfx803 - [code object v4 is amdgcn-amd-amdhsa--gfx803]
:1:hip_code_object.cpp :479 : 1885938549 us: hipv4-amdgcn-amd-amdhsa--gfx900:xnack- - [code object v4 is amdgcn-amd-amdhsa--gfx900:xnack-]
:1:hip_code_object.cpp :479 : 1885938550 us: hipv4-amdgcn-amd-amdhsa--gfx906:xnack- - [code object v4 is amdgcn-amd-amdhsa--gfx906:xnack-]
:1:hip_code_object.cpp :479 : 1885938552 us: hipv4-amdgcn-amd-amdhsa--gfx908:xnack- - [code object v4 is amdgcn-amd-amdhsa--gfx908:xnack-]
:1:hip_code_object.cpp :479 : 1885938553 us: hipv4-amdgcn-amd-amdhsa--gfx90a:xnack+ - [code object v4 is amdgcn-amd-amdhsa--gfx90a:xnack+]
:1:hip_code_object.cpp :479 : 1885938555 us: hipv4-amdgcn-amd-amdhsa--gfx90a:xnack- - [code object v4 is amdgcn-amd-amdhsa--gfx90a:xnack-]
/home/foo/.cache/yay/hip-rocclr/src/HIP-rocm-4.3.1/rocclr/hip_code_object.cpp:486: "hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"
[1] 17615 abort (core dumped) AMD_LOG_LEVEL=6 ../../env/bin/python 02-Clustering.py
AMD_LOG_LEVEL=6 ../../env/bin/python 02-Clustering.py 2,52s user 3,90s system 141% cpu 4,544 total
At least i got HIP running:
❯ AMD_LOG_LEVEL=6 ./square.out
:3:rocdevice.cpp :430 : 10625438911 us: Initializing HSA stack.
:3:comgrctx.cpp :33 : 10625460831 us: Loading COMGR library.
:3:rocdevice.cpp :196 : 10625465529 us: Numa selects cpu agent[0]=0x205e2a0(fine=0x20f1f80,coarse=0x20f7560, kern_arg=0x210f8d0) for gpu agent=0x7fcbce7c3b34
:3:rocdevice.cpp :1562: 10625466470 us: HMM support: 0, xnack: 0
:4:rocdevice.cpp :1858: 10625466635 us: Allocate hsa host memory 0x7fcbcea34000, size 0x28
:4:rocdevice.cpp :1858: 10625467138 us: Allocate hsa host memory 0x7fcbcd400000, size 0x101000
:4:rocdevice.cpp :1858: 10625467587 us: Allocate hsa host memory 0x7fcbcd200000, size 0x101000
:4:runtime.cpp :82 : 10625467659 us: init
:3:hip_device.cpp :239 : 10625467704 us: 30526: [7fcbcddfb540] hipGetDeviceProperties: Returned hipSuccess :
info: running on device Cezanne
info: allocate host mem ( 7.63 MB)
info: allocate device mem ( 7.63 MB)
:3:hip_memory.cpp :384 : 10625470790 us: 30526: [7fcbcddfb540] hipMalloc ( 0x7ffca2147320, 4000000 )
:4:rocdevice.cpp :1993: 10625470946 us: Allocate hsa device memory 0x7fcbcc400000, size 0x3d0900
:3:rocdevice.cpp :2032: 10625470952 us: device=0x211d4b0, freeMem_ = 0xffc2f700
:3:hip_memory.cpp :386 : 10625470960 us: 30526: [7fcbcddfb540] hipMalloc: Returned hipSuccess : 0x7fcbcc400000: duration: 170 us
:3:hip_memory.cpp :384 : 10625470964 us: 30526: [7fcbcddfb540] hipMalloc ( 0x7ffca2147318, 4000000 )
:4:rocdevice.cpp :1993: 10625471018 us: Allocate hsa device memory 0x7fcbc0800000, size 0x3d0900
:3:rocdevice.cpp :2032: 10625471026 us: device=0x211d4b0, freeMem_ = 0xff85ee00
:3:hip_memory.cpp :386 : 10625471032 us: 30526: [7fcbcddfb540] hipMalloc: Returned hipSuccess : 0x7fcbc0800000: duration: 68 us
info: copy Host2Device
:3:hip_memory.cpp :429 : 10625471065 us: 30526: [7fcbcddfb540] hipMemcpy ( 0x7fcbcc400000, 0x7fcbcce2f010, 4000000, hipMemcpyHostToDevice )
:3:rocdevice.cpp :2543: 10625471616 us: number of allocated hardware queues with low priority: 0, with normal priority: 0, with high priority: 0, maximum per priority is: 4
:3:rocdevice.cpp :2618: 10625478601 us: created hardware queue 0x7fcbcd5f5000 with size 1024 with priority 1, cooperative: 0
:4:rocdevice.cpp :1858: 10625478822 us: Allocate hsa host memory 0x7fcbcc980000, size 0x80000
:3:devprogram.cpp :2466: 10625710710 us: Using Code Object V4.
:4:command.cpp :303 : 10625712610 us: command is enqueued: 0x214adc0
:4:command.cpp :262 : 10625712653 us: queue marker to command queue: 0x20f1b20
:4:command.cpp :303 : 10625712656 us: command is enqueued: 0x205e500
:4:command.cpp :222 : 10625712657 us: waiting for event 0x214adc0 to complete, current status 3
:4:commandqueue.cpp :176 : 10625713048 us: command (CopyHostToDevice) is submitted: 0x214adc0
:4:rocvirtual.hpp :200 : 10625713254 us: [7fcbcd562640]! WaitCurret completion_signal=0x7fcbcea46b00
:4:rocvirtual.hpp :228 : 10625713263 us: [7fcbcd562640]! WaitNext completion_signal=0x7fcbcea46a80
:4:rocblit.cpp :670 : 10625713266 us: [7fcbcd562640]! HSA Asycn Copy wait_event=0x0, completion_signal=0x7fcbcea46b00
:4:rocvirtual.hpp :200 : 10625713701 us: [7fcbcd562640]! WaitCurret completion_signal=0x7fcbcea46b00
:4:rocvirtual.cpp :449 : 10625713708 us: [7fcbcd562640]! Host wait on completion_signal=0x7fcbcea46b00
:4:commandqueue.cpp :176 : 10625714354 us: command (InternalMarker) is submitted: 0x205e500
:4:rocvirtual.hpp :200 : 10625714368 us: [7fcbcd562640]! WaitCurret completion_signal=0x7fcbcea46b00
:4:command.cpp :236 : 10625714371 us: event 0x214adc0 wait completed
:4:command.cpp :152 : 10625714372 us: Command 0x214adc0 complete
:4:command.cpp :152 : 10625714374 us: Command 0x205e500 complete
:3:hip_memory.cpp :432 : 10625714379 us: 30526: [7fcbcddfb540] hipMemcpy: Returned hipSuccess : : duration: 243314 us
info: launch 'vector_square' kernel
:3:hip_platform.cpp :202 : 10625714411 us: 30526: [7fcbcddfb540] __hipPushCallConfiguration ( {512,1,1}, {256,1,1}, 0, stream:<null> )
:3:hip_platform.cpp :206 : 10625714419 us: 30526: [7fcbcddfb540] __hipPushCallConfiguration: Returned hipSuccess :
:3:hip_platform.cpp :213 : 10625714430 us: 30526: [7fcbcddfb540] __hipPopCallConfiguration ( {34542240,0,34538320}, {3458397079,32715,18}, 0x7ffca2147330, 0x7ffca2147328 )
:3:hip_platform.cpp :222 : 10625714433 us: 30526: [7fcbcddfb540] __hipPopCallConfiguration: Returned hipSuccess :
:3:hip_module.cpp :489 : 10625714444 us: 30526: [7fcbcddfb540] hipLaunchKernel ( 0x401c10, {512,1,1}, {256,1,1}, 0x7ffca2147370, 0, stream:<null> )
:3:devprogram.cpp :2466: 10625714623 us: Using Code Object V4.
:3:hip_module.cpp :358 : 10625715521 us: 30526: [7fcbcddfb540] ihipModuleLaunchKernel ( 0x0x21577a0, 131072, 1, 1, 256, 1, 1, 0, stream:<null>, 0x7ffca2147370, char array:<null>, event:0, event:0, 0, 0 )
:4:command.cpp :303 : 10625715595 us: command is enqueued: 0x215f780
:3:hip_platform.cpp :638 : 10625715619 us: 30526: [7fcbcddfb540] ihipLaunchKernel: Returned hipSuccess :
:3:hip_module.cpp :491 : 10625715635 us: 30526: [7fcbcddfb540] hipLaunchKernel: Returned hipSuccess :
info: copy Device2Host
:3:hip_memory.cpp :429 : 10625715651 us: 30526: [7fcbcddfb540] hipMemcpy ( 0x7fcbcca5e010, 0x7fcbc0800000, 4000000, hipMemcpyDeviceToHost )
:4:command.cpp :303 : 10625715657 us: command is enqueued: 0x214adc0
:4:command.cpp :262 : 10625715660 us: queue marker to command queue: 0x20f1b20
:4:command.cpp :303 : 10625715661 us: command is enqueued: 0x2157db0
:4:command.cpp :222 : 10625715662 us: waiting for event 0x214adc0 to complete, current status 3
:4:commandqueue.cpp :176 : 10625715663 us: command (KernelExecution) is submitted: 0x215f780
:3:rocvirtual.cpp :603 : 10625715679 us: ! arg0: = ptr:0x7fcbc0800000 obj:[0x7fcbc0800000-0x7fcbc0bd0900] threadId : 7fcbcd562640
:3:rocvirtual.cpp :603 : 10625715685 us: ! arg1: = ptr:0x7fcbcc400000 obj:[0x7fcbcc400000-0x7fcbcc7d0900] threadId : 7fcbcd562640
:3:rocvirtual.cpp :2560: 10625715689 us: [7fcbcd562640]! ShaderName : _Z13vector_squareIfEvPT_S1_m
:4:rocvirtual.cpp :753 : 10625715723 us: [7fcbcd562640] HWq=0x7fcbcd5f5000, Dispatch Header = 0x502 (type=2, barrier=1, acquire=2, release=0), setup=3, grid=[131072, 1, 1], workgroup=[256, 1, 1], private_seg_size=0, group_seg_size=0, kernel_obj=0x7fcbc0408840, kernarg_address=0x7fcbcc980000, completion_signal=0x0
:4:commandqueue.cpp :176 : 10625715741 us: command (CopyDeviceToHost) is submitted: 0x214adc0
:4:rocvirtual.hpp :200 : 10625717359 us: [7fcbcd562640]! WaitCurret completion_signal=0x7fcbcea46a80
:4:rocvirtual.hpp :228 : 10625717373 us: [7fcbcd562640]! WaitNext completion_signal=0x7fcbcea46a00
:4:rocvirtual.cpp :871 : 10625717384 us: [7fcbcd562640] HWq=0x7fcbcd5f5000, BarrierAND Header = 0x1503 (type=3, barrier=1, acquire=2, release=2), dep_signal=[0x0, 0x0, 0x0, 0x0, 0x0], completion_signal=0x7fcbcea46a80
:4:rocvirtual.hpp :200 : 10625717406 us: [7fcbcd562640]! WaitCurret completion_signal=0x7fcbcea46a00
:4:rocvirtual.hpp :228 : 10625717414 us: [7fcbcd562640]! WaitNext completion_signal=0x7fcbcea46980
:4:rocblit.cpp :670 : 10625717420 us: [7fcbcd562640]! HSA Asycn Copy wait_event=0x0, completion_signal=0x7fcbcea46a00
:4:rocvirtual.hpp :200 : 10625718851 us: [7fcbcd562640]! WaitCurret completion_signal=0x7fcbcea46a00
:4:rocvirtual.cpp :449 : 10625718897 us: [7fcbcd562640]! Host wait on completion_signal=0x7fcbcea46a00
:4:commandqueue.cpp :176 : 10625719926 us: command (InternalMarker) is submitted: 0x2157db0
:4:rocvirtual.hpp :200 : 10625719960 us: [7fcbcd562640]! WaitCurret completion_signal=0x7fcbcea46a00
:4:command.cpp :152 : 10625719969 us: Command 0x215f780 complete
:4:command.cpp :152 : 10625719978 us: Command 0x214adc0 complete
:4:command.cpp :152 : 10625719981 us: Command 0x2157db0 complete
:4:command.cpp :236 : 10625719984 us: event 0x214adc0 wait completed
:3:hip_memory.cpp :432 : 10625720011 us: 30526: [7fcbcddfb540] hipMemcpy: Returned hipSuccess : : duration: 4360 us
info: check result
PASSED!
So it said one of rocm-libs isn't built for gfx902. square run properly, means you have only APU, not with GPU, I guess. Next step is test rocm-libs one by one, to find wich component need rebuild for gfx902.
2 months after last posts, I will close this issue, please reopen if there is any updates.
@delijati I've been trying to get HIP running on a gfx902, but have had no luck. What version of rocm did you use? Did you have to build from source?
@delijati I have, in the last 2 weeks, spend significant time trying to get Pytorch on ROCm on gfx902 running. I experimented with different Linux versions but mostly Ubuntu, different ROCm / AMDGPU versions, with no luck. The few combinations I got working generally end with this error: "HIP error: shared object initialization failed. I am now declaring it a failure and impossibility, and for ML/DL testing getting a video card that does not use ROCm, as official ROCm support is about 6 cards today, no APUs.
Hi,
i have a gfx902 APU -> Ryzen 5850U. I'm just reaching into the wild if someone has any success getting this card to run with rocm.
Thanks