lamikr / rocm_sdk_builder

Other
113 stars 8 forks source link

hipBLASLt Build Error - TypeError: '<' not supported between instances of 'str' and 'bool' #67

Closed jrl290 closed 1 week ago

jrl290 commented 2 weeks ago

Ubuntu 22.04 - Building for gfx1101 and gfx1102

Steps to reproduce

# git clone https://github.com/lamikr/rocm_sdk_builder.git
# cd rocm_sdk_builder
# git checkout releases/rocm_sdk_builder_611
# ./babs.sh -i
# ./install_deps.sh
# ./babs.sh -b
Traceback (most recent call last):
  File "/home/minipc/rocm_sdk_builder/builddir/025_02_hipBLASLt/library/../virtualenv/lib/python3.9/site-packages/Tensile/bin/TensileCreateLibrary", line 43, in <module>
    TensileCreateLibrary()
  File "/opt/rocm_sdk_611/lib/python3.9/site-packages/Tensile/TensileCreateLibrary.py", line 1218, in TensileCreateLibrary
    kernelMinNaming, _ = getKernelWriters(solutions, kernels)
  File "/opt/rocm_sdk_611/lib/python3.9/site-packages/Tensile/TensileCreateLibrary.py", line 630, in getKernelWriters
    kernelSerialNaming   = Solution.getSerialNaming(kernels)
  File "/opt/rocm_sdk_611/lib/python3.9/site-packages/Tensile/SolutionStructs.py", line 4823, in getSerialNaming
    data[paramName] = sorted(data[paramName])
TypeError: '<' not supported between instances of 'str' and 'bool'
make[2]: *** [library/CMakeFiles/TENSILE_LIBRARY_TARGET.dir/build.make:74: Tensile/library/TensileManifest.txt] Error 1
make[2]: Leaving directory '/home/minipc/rocm_sdk_builder/builddir/025_02_hipBLASLt'
make[1]: *** [CMakeFiles/Makefile2:249: library/CMakeFiles/TENSILE_LIBRARY_TARGET.dir/all] Error 2
make[1]: Leaving directory '/home/minipc/rocm_sdk_builder/builddir/025_02_hipBLASLt'
make: *** [Makefile:166: all] Error 2
build failed: hipBLASLt
Build failed
lamikr commented 2 weeks ago

I will not have myself those cards but I will try to reproduce the issue today. Do you have all updates for Ubuntu 22.04 installed?

jrl290 commented 2 weeks ago

I did, yes. But I ended up reformatting and starting with a fresh Ubuntu (no updating this time) and it seems to have gone away. Now I'm stuck at

downloading and extracting https://tritonlang.blob.core.windows.net/llvm-builds/llvm-49af6502-ubuntu-x64.tar.gz ...

For hours without CPU, IO, or network activity. Thank you for responding though. This issue can probably be closed

lamikr commented 2 weeks ago

Thanks for confirming, The tritonlang-download error is propably just a temporarily network download error thats have got to stuck. What happens if you just stop the build (ctrl-x) and re-run the babs.sh -b command? Let me know if you manage to get everything build?

Unfortunately these python apps, especially the onnxruntime do their own downloads and install of apps to their build directories during the build time and I do not have any clear solution how to force them to use locally downloaded versions of those files.

I also did the fresh install of Ubuntu 22.04.4 today and I did the test build. For me the build just finished ok.

I have however made some cleaning and updates on yesterday and today for the build-order of apps and also to the versions build in this week. There are now newer version of python, pytorch, pytorch audio, pytorch vision, onnxruntime, deepspeed included.

If you want to try out, you should run following commands set of commands to make sure everything is patched to latest version:

git pull ./babs.sh -i ./babs.sh -f ./babs.sh -co ./babs.sh -ap ./babs.sh -b

Some kind of reset command doing all of these steps would be handy... :-)

jrl290 commented 2 weeks ago

I very much appreciate this and all of your work

I ended up being able to get through the triton build by setting the environment variable MAX_JOBS=1 after referring to this issue: https://github.com/lamikr/rocm_sdk_builder/issues/28

And I got through the full build process successfully! I used a few provided examples to test and aside from COMGR API could not find the CO for this GPU device/ISA warnings they completed successfully!

Next up is Caught signal 11 (Segmentation fault: address not mapped to object at address 0x41b40) when calling import torchaudio

But then I realized I hadn't installed the latest linux kernel (v6.10.0-rc2) which added support for APU shared memory (I'm on a Ryzen 7840U / gfx1103 with 32GB). So right now I'm starting from scratch again. I was going to try a different branch of rocm_sdk_builder (older or newer), but since there are updates for the main branch, I'll try it again.

Again, your work on this is great. I have another machine running a 6800U APU set to gfx1030 and it has been amazing running pytorch on. So I bought this machine, but it hasn't gone as smoothly with AMD's builds. Setting to gfx1100 actually works but is prone to crashing, so I've been wanting to try gfx1101 or gfx1102 to see if either of them work, but AMD doesn't have a build for them

Thanks again and let me know if there's any more info I can provide

lamikr commented 2 weeks ago

What does the rocminfo report for your 6800u devices gfx-number, is it gfx1035?

I may get access to Framework 16 laptop which has amd 7000x series GPU and after that I could test with the newer amd GPU's also. And I may also get some help from AMD, lets see how things goes.

Never GPU's have already many places working right out of the box as rocm is generally better tested on them but I would not be surprised if some patches needs still to be added when testing and building whole chain.

Stefan-Olt reported on https://github.com/lamikr/rocm_sdk_builder/issues/30 that he was able to use whisper with rx-5500 basically by installing it on

source /opt/rock_sdk_611/bin/env_rocm.sh
pip3 install openai-whisper
whisper --model medium some_speech_audio_file.mp3

I was planning to test that on this weekend by feeding some music mp3's for it :-) Could you open a new issue for the other problem and close this one?

jrl290 commented 1 week ago

What does the rocminfo report for your 6800u devices gfx-number, is it gfx1035?

That is correct

minipc@minipc:~/aipython$ rocminfo
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE
System Endianness:       LITTLE

==========
HSA Agents
==========
*******
Agent 1
*******
  Name:                    AMD Ryzen 7 6800U with Radeon Graphics
  Uuid:                    CPU-XX
  Marketing Name:          AMD Ryzen 7 6800U with Radeon Graphics
  Vendor Name:             CPU
  Feature:                 None specified
  Profile:                 FULL_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        0(0x0)
  Queue Min Size:          0(0x0)
  Queue Max Size:          0(0x0)
  Queue Type:              MULTI
  Node:                    0
  Device Type:             CPU
  Cache Info:
    L1:                      32768(0x8000) KB
  Chip ID:                 0(0x0)
  ASIC Revision:           0(0x0)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   4768
  BDFID:                   0
  Internal Node ID:        0
  Compute Unit:            16
  SIMDs per CU:            0
  Shader Engines:          0
  Shader Arrs. per Eng.:   0
  WatchPts on Addr. Ranges:1
  Features:                None
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: FINE GRAINED
      Size:                    16098760(0xf5a5c8) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 2
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    16098760(0xf5a5c8) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 3
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    16098760(0xf5a5c8) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
  ISA Info:
*******
Agent 2
*******
  Name:                    gfx1035
  Uuid:                    GPU-XX
  Marketing Name:          AMD Radeon Graphics
  Vendor Name:             AMD
  Feature:                 KERNEL_DISPATCH
  Profile:                 BASE_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        128(0x80)
  Queue Min Size:          64(0x40)
  Queue Max Size:          131072(0x20000)
  Queue Type:              MULTI
  Node:                    1
  Device Type:             GPU
  Cache Info:
    L1:                      16(0x10) KB
    L2:                      2048(0x800) KB
  Chip ID:                 5761(0x1681)
  ASIC Revision:           2(0x2)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   2200
  BDFID:                   25344
  Internal Node ID:        1
  Compute Unit:            12
  SIMDs per CU:            2
  Shader Engines:          1
  Shader Arrs. per Eng.:   2
  WatchPts on Addr. Ranges:4
  Features:                KERNEL_DISPATCH
  Fast F16 Operation:      TRUE
  Wavefront Size:          32(0x20)
  Workgroup Max Size:      1024(0x400)
  Workgroup Max Size per Dimension:
    x                        1024(0x400)
    y                        1024(0x400)
    z                        1024(0x400)
  Max Waves Per CU:        32(0x20)
  Max Work-item Per CU:    1024(0x400)
  Grid Max Size:           4294967295(0xffffffff)
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)
    y                        4294967295(0xffffffff)
    z                        4294967295(0xffffffff)
  Max fbarriers/Workgrp:   32
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    16777216(0x1000000) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 2
      Segment:                 GROUP
      Size:                    64(0x40) KB
      Allocatable:             FALSE
      Alloc Granule:           0KB
      Alloc Alignment:         0KB
      Accessible by all:       FALSE
  ISA Info:
    ISA 1
      Name:                    amdgcn-amd-amdhsa--gfx1035
      Machine Models:          HSA_MACHINE_MODEL_LARGE
      Profiles:                HSA_PROFILE_BASE
      Default Rounding Mode:   NEAR
      Default Rounding Mode:   NEAR
      Fast f16:                TRUE
      Workgroup Max Size:      1024(0x400)
      Workgroup Max Size per Dimension:
        x                        1024(0x400)
        y                        1024(0x400)
        z                        1024(0x400)
      Grid Max Size:           4294967295(0xffffffff)
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)
        y                        4294967295(0xffffffff)
        z                        4294967295(0xffffffff)
      FBarrier Max Size:       32
*** Done ***

I installed from scratch and ran into some issues on build relating to building for gfx1100 and gfx1101 (along with gfx1102). I removed those and it completed fine, but still ran into the same problem with import torchaudio. I'll open a new issue for this

Stefan-Olt reported on https://github.com/lamikr/rocm_sdk_builder/issues/30 that he was able to use whisper with rx-5500 basically by installing it on

I tested this and it worked fine. But it doesn't seem to use torchaudio