lamikr / rocm_sdk_builder

Other
113 stars 8 forks source link

Fedora 40: Compile failure after GCC14 patch #12

Closed Crizle closed 3 weeks ago

Crizle commented 1 month ago

Initial compile attempt failed with this issue here: https://github.com/ROCm/rocm_smi_lib/issues/170

After applying the following patch, this fixed the initial issue above:

--- a/include/rocm_smi/rocm_smi_utils.h 2024-05-25 00:02:19.127412816 -0400
+++ b/include/rocm_smi/rocm_smi_utils.h 2024-05-25 00:03:25.359416227 -0400
@@ -149,7 +149,7 @@
   __forceinline ~ScopeGuard() {
     if (!dismiss_) release_();
   }
-  __forceinline ScopeGuard& operator=(const ScopeGuard& rhs) {
+  __forceinline ScopeGuard& operator=(ScopeGuard& rhs) {
     dismiss_ = rhs.dismiss_;
     release_ = rhs.release_;
     rhs.dismiss_ = true;

Now the rocm_sdk_builder compilation fails here:

/home/chris/rocm_sdk_builder/src_projects/openmpi/3rd-party/openpmix/include/pmix_deprecated.h:851:32: error: passing argument 2 of 'PMIx_Data_buffer_unload' from incompatible pointer type [-Wincompatible-pointer-types]
  851 |     PMIx_Data_buffer_unload(b, &(d), &(s))
      |                                ^~~~
      |                                |
      |                                void **
/home/chris/rocm_sdk_builder/src_projects/openmpi/oshmem/mca/memheap/base/memheap_base_mkey.c:451:5: note: in expansion of macro 'PMIX_DATA_BUFFER_UNLOAD'
  451 |     PMIX_DATA_BUFFER_UNLOAD(msg, buffer, size);
      |     ^~~~~~~~~~~~~~~~~~~~~~~
/home/chris/rocm_sdk_builder/src_projects/openmpi/3rd-party/openpmix/include/pmix_deprecated.h:352:49: note: expected 'char **' but argument is of type 'void **'
  352 |                                          char **bytes, size_t *sz);
      |                                          ~~~~~~~^~~~~
/home/chris/rocm_sdk_builder/src_projects/openmpi/oshmem/mca/memheap/base/memheap_base_mkey.c: In function 'mca_memheap_modex_recv_all':
/home/chris/rocm_sdk_builder/src_projects/openmpi/3rd-party/openpmix/include/pmix_deprecated.h:851:32: error: passing argument 2 of 'PMIx_Data_buffer_unload' from incompatible pointer type [-Wincompatible-pointer-types]
  851 |     PMIx_Data_buffer_unload(b, &(d), &(s))
      |                                ^~~~
      |                                |
      |                                void **
/home/chris/rocm_sdk_builder/src_projects/openmpi/oshmem/mca/memheap/base/memheap_base_mkey.c:586:5: note: in expansion of macro 'PMIX_DATA_BUFFER_UNLOAD'
  586 |     PMIX_DATA_BUFFER_UNLOAD(msg, send_buffer, size);
      |     ^~~~~~~~~~~~~~~~~~~~~~~
/home/chris/rocm_sdk_builder/src_projects/openmpi/3rd-party/openpmix/include/pmix_deprecated.h:352:49: note: expected 'char **' but argument is of type 'void **'
  352 |                                          char **bytes, size_t *sz);
      |                                          ~~~~~~~^~~~~
make[2]: *** [Makefile:1527: base/memheap_base_mkey.lo] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: Leaving directory '/home/chris/rocm_sdk_builder/builddir/015_03_openmpi/oshmem/mca/memheap'
make[1]: *** [Makefile:1920: all-recursive] Error 1
make[1]: Leaving directory '/home/chris/rocm_sdk_builder/builddir/015_03_openmpi/oshmem'
make: *** [Makefile:1534: all-recursive] Error 1
build failed: openmpi

build failed

My skill level prevents me fixing this, any idea?

GPU: Radeon RX 7800 XT
CPU: AMD Ryzen™ 5 5600G with Radeon™ Graphics × 12
KERNEL: Linux 6.8.10-300.fc40.x86_64
GCC Info: Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/14/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,m2,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --enable-libstdcxx-backtrace --with-libstdcxx-zoneinfo=/usr/share/zoneinfo --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl=/builddir/build/BUILD/gcc-14.1.1-20240522/obj-x86_64-redhat-linux/isl-install --enable-offload-targets=nvptx-none,amdgcn-amdhsa --enable-offload-defaulted --without-cuda-driver --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux --with-build-config=bootstrap-lto --enable-link-serialization=1
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 14.1.1 20240522 (Red Hat 14.1.1-4) (GCC)
lamikr commented 1 month ago

Did you run the install_deps.script before starting the build?

I need to setup Fedora 40 env for testing, so far I have tested with little older fedora versions. (39)

Crizle commented 1 month ago

Did you run the install_deps.script before starting the build?

I need to setup Fedora 40 env for testing, so far I have tested with little older fedora versions. (39)

Yes, I ran the ./install_deps.sh before starting the build.

Last metadata expiration check: 0:00:01 ago on Wed 29 May 2024 08:37:01 BST.
Package cmake-3.28.2-1.fc40.x86_64 is already installed.
Package rpm-build-4.19.1.1-1.fc40.x86_64 is already installed.
Package gcc-14.1.1-4.fc40.x86_64 is already installed.
Package gcc-c++-14.1.1-4.fc40.x86_64 is already installed.
Package openssl-devel-1:3.2.1-2.fc40.x86_64 is already installed.
Package zlib-ng-compat-devel-2.1.6-2.fc40.x86_64 is already installed.
Package gcc-gfortran-14.1.1-4.fc40.x86_64 is already installed.
Package make-1:4.4.1-6.fc40.x86_64 is already installed.
Package libcxx-devel-18.1.6-1.fc40.x86_64 is already installed.
Package numactl-libs-2.0.16-5.fc40.x86_64 is already installed.
Package numactl-devel-2.0.16-5.fc40.x86_64 is already installed.
Package dpkg-dev-1.22.6-1.fc40.noarch is already installed.
Package doxygen-2:1.10.0-3.fc40.x86_64 is already installed.
Package elfutils-libelf-devel-0.191-4.fc40.x86_64 is already installed.
Package prename-1.14-5.fc40.noarch is already installed.
Package perl-URI-Encode-1.1.1-23.fc40.noarch is already installed.
Package perl-File-Listing-6.16-3.fc40.noarch is already installed.
Package perl-File-BaseDir-0.09-9.fc40.noarch is already installed.
Package fftw-devel-3.3.10-12.fc40.x86_64 is already installed.
Package wget2-wget-2.1.0-9.fc40.x86_64 is already installed.
Package libdrm-devel-2.4.120-3.fc40.x86_64 is already installed.
Package xxd-2:9.1.393-1.fc40.x86_64 is already installed.
Package glew-devel-2.2.0-7.fc40.x86_64 is already installed.
Package python3-cppheaderparser-2.7.4-13.fc40.noarch is already installed.
Package autoconf-2.71-10.fc40.noarch is already installed.
Package automake-1.16.5-16.fc40.noarch is already installed.
Package libtool-2.4.7-10.fc40.x86_64 is already installed.
Package icu-74.2-1.fc40.x86_64 is already installed.
Package bzip2-devel-1.0.8-18.fc40.x86_64 is already installed.
Package lzma-sdk-devel-22.01-3.fc40.x86_64 is already installed.
Package libicu-devel-74.2-1.fc40.x86_64 is already installed.
Package msgpack-devel-3.1.0-14.fc40.x86_64 is already installed.
Package libffi-devel-3.4.4-7.fc40.x86_64 is already installed.
Package json-devel-3.11.3-1.fc40.x86_64 is already installed.
Package texinfo-7.1-2.fc40.x86_64 is already installed.
Package python3-pip-23.3.2-1.fc40.noarch is already installed.
Package sqlite-devel-3.45.1-2.fc40.x86_64 is already installed.
Package git-2.45.1-1.fc40.x86_64 is already installed.
Package git-lfs-3.4.1-5.fc40.x86_64 is already installed.
Package lbzip2-2.5-29.20171011gitb6dc48a.fc40.x86_64 is already installed.
Package opencv-devel-4.9.0-3.fc40.x86_64 is already installed.
Package ffmpeg-free-6.1.1-12.fc40.x86_64 is already installed.
Package valgrind-1:3.23.0-1.fc40.x86_64 is already installed.
Package perl-FindBin-1.53-506.fc40.noarch is already installed.
Package pmix-devel-4.2.8-2.fc40.x86_64 is already installed.
Package libfl-static-2.6.4-16.fc40.x86_64 is already installed.
Package bison-devel-3.8.2-7.fc40.x86_64 is already installed.
Package bison-3.8.2-7.fc40.x86_64 is already installed.
Package flex-2.6.4-16.fc40.x86_64 is already installed.
Package byacc-2.0.20230521-5.fc40.x86_64 is already installed.
Package gettext-0.22.5-2.fc40.x86_64 is already installed.
Package xz-devel-1:5.4.6-3.fc40.x86_64 is already installed.
Package ninja-build-1.11.1-7.fc40.x86_64 is already installed.
Package texlive-scheme-small-11:svn54191-71.fc40.noarch is already installed.
Package protobuf-devel-3.19.6-8.fc40.x86_64 is already installed.
Package pybind11-devel-2.11.1-3.fc40.x86_64 is already installed.
Package libaio-devel-0.3.111-19.fc40.x86_64 is already installed.
Package gmp-devel-1:6.2.1-8.fc40.x86_64 is already installed.
Package mpfr-devel-4.2.1-4.fc40.x86_64 is already installed.
Package libpng-devel-2:1.6.40-3.fc40.x86_64 is already installed.
Package libjpeg-turbo-devel-3.0.2-1.fc40.x86_64 is already installed.
Dependencies resolved.
Nothing to do.
Complete!
Updated Git hooks.
Git LFS initialized.
Dependencies installed, you can now start using the babs.sh command
lamikr commented 1 month ago

I can confirm the both build breaks on Fedora 40 and will now submit a patch for the first one based on your gentoo bug finding. Need to then test with other distros before applying it. After that I will check the fix for the second build break.

lamikr commented 1 month ago

For the second build problem, there seems to be bug report and fix in here: https://github.com/open-mpi/ompi/issues/12169

Crizle commented 1 month ago

Thanks for looking, glad to help the best I can.

lamikr commented 1 month ago

rocm_smi_lib and openmpi builds now ok on Fedora 40, but next failed package is amd_fftw which will also need some parameter typefixes for gcc14.

/home/lamikr/own/rocm/src/sdk/rocm_sdk_builder_611/src_projects/amd-fftw/mpi/transpose-pairwise-omc.c: In function ‘transpose_chunks’:
/home/lamikr/own/rocm/src/sdk/rocm_sdk_builder_611/src_projects/amd-fftw/mpi/transpose-pairwise-omc.c:108:115: error: passing argument 7 of ‘MPI_Isend’ from incompatible pointer type [-Wincompatible-pointer-types]
  108 |                    MPI_Isend(buf[j&0x1], (int) (sbs[pe]), FFTW_MPI_TYPE, pe, (my_pe * n_pes + pe) & 0xffff, comm, &send_status);
      |                                                                                                                   ^~~~~~~~~~~~
      |                                                                                                                   |
      |                                                                                                                   MPI_Status * {aka struct ompi_status_public_t *}
libtool: compile:  mpicc -DHAVE_CONFIG_H -I. -I/home/lamikr/own/rocm/src/sdk/rocm_sdk_builder_611/src_projects/amd-fftw/mpi -I.. -I /home/lamikr/own/rocm/src/sdk/rocm_sdk_builder_611/src_projects/amd-fftw -I /home/lamikr/own/rocm/src/sdk/rocm_sdk_builder_611/src_projects/amd-fftw/api -I/opt/rocm_sdk_611/include -I/opt/rocm_sdk_611/hsa/include -I/opt/rocm_sdk_611/rocm_smi/include -I/opt/rocm_sdk_611/rocblas/include -I/opt/rocm_sdk_611/include -I/opt/rocm_sdk_611/hsa/include -I/opt/rocm_sdk_611/rocm_smi/include -I/opt/rocm_sdk_611/rocblas/include -mno-avx256-split-unaligned-store -mno-avx256-split-unaligned-load -mno-prefer-avx128 -MT dft-rank-geq2-transposed.lo -MD -MP -MF .deps/dft-rank-geq2-transposed.Tpo -c /home/lamikr/own/rocm/src/sdk/rocm_sdk_builder_611/src_projects/amd-fftw/mpi/dft-rank-geq2-transposed.c  -fPIC -DPIC -o .libs/dft-rank-geq2-transposed.o
In file included from /home/lamikr/own/rocm/src/sdk/rocm_sdk_builder_611/src_projects/amd-fftw/mpi/ifftw-mpi.h:28,
                 from /home/lamikr/own/rocm/src/sdk/rocm_sdk_builder_611/src_projects/amd-fftw/mpi/mpi-transpose.h:22,
                 from /home/lamikr/own/rocm/src/sdk/rocm_sdk_builder_611/src_projects/amd-fftw/mpi/transpose-pairwise-omc.c:32:
/opt/rocm_sdk_611/include/mpi.h:1783:67: note: expected ‘struct ompi_request_t **’ but argument is of type ‘MPI_Status *’ {aka ‘struct ompi_status_public_t *’}
 1783 |                              int tag, MPI_Comm comm, MPI_Request *request);
      |                                                      ~~~~~~~~~~~~~^~~~~~~
/home/lamikr/own/rocm/src/sdk/rocm_sdk_builder_611/src_projects/amd-fftw/mpi/transpose-pairwise-omc.c:109:116: error: passing argument 7 of ‘MPI_Irecv’ from incompatible pointer type [-Wincompatible-pointer-types]
  109 |                    MPI_Irecv(O + rbo[pe], (int) (rbs[pe]), FFTW_MPI_TYPE, pe, (pe * n_pes + my_pe) & 0xffff, comm, &recv_status);
      |                                                                                                                    ^~~~~~~~~~~~
      |                                                                                                                    |
      |                                                                                                                    MPI_Status * {aka struct ompi_status_public_t *}
/opt/rocm_sdk_611/include/mpi.h:1779:67: note: expected ‘struct ompi_request_t **’ but argument is of type ‘MPI_Status *’ {aka ‘struct ompi_status_public_t *’}
 1779 |                              int tag, MPI_Comm comm, MPI_Request *request);
      |                                                      ~~~~~~~~~~~~~^~~~~~~
/home/lamikr/own/rocm/src/sdk/rocm_sdk_builder_611/src_projects/amd-fftw/mpi/transpose-pairwise-omc.c:113:29: error: passing argument 1 of ‘MPI_Wait’ from incompatible pointer type [-Wincompatible-pointer-types]
  113 |                    MPI_Wait(&send_status, MPI_STATUS_IGNORE);
      |                             ^~~~~~~~~~~~
      |                             |
      |                             MPI_Status * {aka struct ompi_status_public_t *}
/opt/rocm_sdk_611/include/mpi.h:2099:42: note: expected ‘struct ompi_request_t **’ but argument is of type ‘MPI_Status *’ {aka ‘struct ompi_status_public_t *’}
 2099 | OMPI_DECLSPEC  int MPI_Wait(MPI_Request *request, MPI_Status *status);
      |                             ~~~~~~~~~~~~~^~~~~~~
/home/lamikr/own/rocm/src/sdk/rocm_sdk_builder_611/src_projects/amd-fftw/mpi/transpose-pairwise-omc.c:114:29: error: passing argument 1 of ‘MPI_Wait’ from incompatible pointer type [-Wincompatible-pointer-types]
  114 |                    MPI_Wait(&recv_status, MPI_STATUS_IGNORE);
      |                             ^~~~~~~~~~~~
      |                             |
      |                             MPI_Status * {aka struct ompi_status_public_t *}
lamikr commented 1 month ago

@Crizle Thanks for help. There are now quite many fixes in place for Fedora 40, but my build is still on going so it's more than likely that there will show up more places where something will break on gcc 14. If you want to try what's been done so far, you could sync latest patches and continue from where you were left on build with these commands:

# git pull
# ./babs.sh -co
# ./babs.sh -ap
# rm -rf builddir/015_03_openmpi
# ./babs.sh -b
Crizle commented 1 month ago

@Crizle Thanks for help. There are now quite many fixes in place for Fedora 40, but my build is still on going so it's more than likely that there will show up more places where something will break on gcc 14. If you want to try what's been done so far, you could sync latest patches and continue from where you were left on build with these commands:

# git pull
# ./babs.sh -co
# ./babs.sh -ap
# rm -rf builddir/015_03_openmpi
# ./babs.sh -b

Been noticing the fixes! All good, I just started the build again with the latest patches and will report back.

lamikr commented 1 month ago

Build almost finishes now but fails now on pytorch on Fedora 40. First failure about the possible non-initialized value on fbgemm can be avoided by modifying the last line of src_projects/pytorch/build_pytorch_rocm.sh by adding there CMAKE_CXX_FLAGS="$CMAKE_CXX_FLAGS -Wno-error=maybe-uninitialized"

So full line is: USE_FLASH_ATTENTION=OFF ROCM_PATH=${install_dir_prefix_rocm} ROCM_SOURCE_DIR=${install_dir_prefix_rocm} CMAKE_CXX_FLAGS="$CMAKE_CXX_FLAGS -Wno-error=maybe-uninitialized" CMAKE_PREFIX_PATH="${install_dir_prefix_rocm};${install_dir_prefix_rocm}/lib64/cmake;${install_dir_prefix_rocm}/lib/cmake;${install_dir_prefix_rocm}/lib64;${install_dir_prefix_rocm}/lib" ROCM_VERSION=${rocm_version_str} HIP_ROOT_DIR=${install_dir_prefix_rocm} USE_ROCM=1 python setup.py install

Next build failure is more tricky and can be a real bug on either compiler or clamp call on pytorch with hipcc compiler. Still investigating it, so can not push a patch yet in for pytorch.

Crizle commented 1 month ago

Current build status has failed, not sure if you have come across this or if it's related to the above commit for AMDMiGraphx or PyTorch:

Steps:

# git pull
# ./babs.sh -co
# ./babs.sh -ap
# rm -rf builddir/015_03_openmpi
# ./babs.sh -b

Result:

[ 61%] Linking CXX executable ../../bin/test_onnx_test
cd /home/chris/rocm_sdk_builder/builddir/035_AMDMIGraphX/test/onnx && /usr/bin/cmake -E cmake_link_script CMakeFiles/test_onnx_test.dir/link.txt --verbose=1
/opt/rocm_sdk_611/bin/clang++ -O3 -DNDEBUG -L/opt/rocm_sdk_611/lib64 -L/opt/rocm_sdk_611/lib -L/opt/rocm_sdk_611/hsa/lib -L/opt/rocm_sdk_611/rocblas/lib -L/opt/rocm_sdk_611/hcc/lib CMakeFiles/test_onnx_test.dir/parse/acos_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/acosh_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/add_bcast_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/add_fp16_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/add_scalar_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/argmax_dyn_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/argmax_select_last_index_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/argmax_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/argmin_select_last_index_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/argmin_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/asin_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/asinh_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/atan_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/atanh_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/averagepool_1d_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/averagepool_3d_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/averagepool_dilate_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/averagepool_dyn_asym_padding_error_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/averagepool_dyn_autopad_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/averagepool_dyn_cip_error_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/averagepool_dyn_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/averagepool_notset_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/averagepool_nt_cip_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/averagepool_same_lower_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/averagepool_same_upper_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/averagepool_sl_cip_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/batch_norm_1d_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/batch_norm_2d_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/batch_norm_3d_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/batch_norm_flat_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/batch_norm_invalid_bias_rank.cpp.o CMakeFiles/test_onnx_test.dir/parse/batch_norm_invalid_rank.cpp.o CMakeFiles/test_onnx_test.dir/parse/batch_norm_rank_2_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/binary_dyn_brcst_add_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/binary_dyn_brcst_attr_error_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/binary_dyn_brcst_mul_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/binary_dyn_brcst_prelu_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/cast_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/castlike_error_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/castlike_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/ceil_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/celu_alpha_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/celu_default_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/celu_wrong_type_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/celu_zero_alpha_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/clip_dyn_min_max_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/clip_dyn_min_only_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/clip_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/clip_test_args_type_mismatch.cpp.o CMakeFiles/test_onnx_test.dir/parse/clip_test_op11.cpp.o CMakeFiles/test_onnx_test.dir/parse/clip_test_op11_max_only.cpp.o CMakeFiles/test_onnx_test.dir/parse/clip_test_op11_min_only.cpp.o CMakeFiles/test_onnx_test.dir/parse/clip_test_op11_no_args.cpp.o CMakeFiles/test_onnx_test.dir/parse/clip_test_op11_no_args1.cpp.o CMakeFiles/test_onnx_test.dir/parse/concat_dyn_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/concat_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/const_of_shape_default_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/const_of_shape_dyn_float_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/const_of_shape_dyn_int64_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/const_of_shape_empty_input_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/const_of_shape_float_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/const_of_shape_int64_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/const_of_shape_no_value_attr_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/constant_empty_scalar_int64_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/constant_fill_input_as_shape_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/constant_fill_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/constant_multiple_attributes_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/constant_no_attributes_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/constant_one_val_int64_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/constant_scalar_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/constant_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/constant_value_float_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/constant_value_floats_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/constant_value_int_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/constant_value_ints_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/conv_1d_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/conv_3d_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/conv_attr_fail_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/conv_autopad_fail_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/conv_autopad_same_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/conv_bias_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/conv_bn_relu_maxpool_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/conv_dynamic_batch_same_upper_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/conv_dynamic_batch_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/conv_dynamic_bias_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/conv_dynamic_img_and_weights_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/conv_dynamic_img_same_upper_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/conv_dynamic_img_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/conv_dynamic_kernel_same_lower_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/conv_dynamic_weights_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/conv_relu_maxpool_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/conv_relu_maxpool_x2_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/conv_transpose_auto_pad_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/conv_transpose_bias_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/conv_transpose_dyn_asym_padding_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/conv_transpose_dyn_batch_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/conv_transpose_dyn_img_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/conv_transpose_dyn_output_shape_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/conv_transpose_input_pads_asymm_1d_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/conv_transpose_input_pads_asymm_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/conv_transpose_input_pads_strides_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/conv_transpose_output_padding_3d_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/conv_transpose_output_padding_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/conv_transpose_output_shape_3d_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/conv_transpose_output_shape_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/conv_transpose_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/convinteger_bias_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/cos_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/cosh_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/depthtospace_crd_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/depthtospace_simple_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/depthtospace_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/dequantizelinear_axis_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/dequantizelinear_neg_axis_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/dequantizelinear_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/dequantizelinear_zero_point_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/dropout_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/dynamicquantizelinear_2d_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/elu_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/embedding_bag_offset_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/embedding_bag_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/equal_bool_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/equal_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/erf_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/exp_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/expand_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/ext_path_external_data_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/external_constant_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/external_data_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/eyelike_default_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/eyelike_double_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/eyelike_half_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/eyelike_k_outofbounds_neg_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/eyelike_k_outofbounds_pos_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/eyelike_k_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/eyelike_not_rank2_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/eyelike_set_dtype_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/flatten_dyn_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/flatten_nonstd_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/flatten_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/floor_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/gather_dyn_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/gather_elements_axis0_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/gather_elements_axis1_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/gather_scalar_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/gather_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/gathernd_batch_dims_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/gathernd_dyn_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/gathernd_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/gemm_brcst_C_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/gemm_dyn_bias_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/gemm_dyn_inner_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/gemm_dyn_outer_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/gemm_half_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/gemm_no_C_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/gemm_rank_error.cpp.o CMakeFiles/test_onnx_test.dir/parse/gemm_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/globalavgpool_dyn_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/globalavgpool_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/globallppool_dyn_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/globallppool_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/globalmaxpool_dyn_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/globalmaxpool_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/greater_bool_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/greater_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/greaterorequal_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/group_conv_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/group_norm_3d_half_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/group_norm_3d_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/group_norm_4d_half_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/group_norm_4d_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/group_norm_5d_half_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/group_norm_5d_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/group_norm_invalid_bias_shape_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/group_norm_invalid_input_count_error_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/group_norm_invalid_input_shape_error_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/group_norm_invalid_num_groups_error_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/group_norm_invalid_scale_shape_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/group_norm_missing_attribute_error_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/group_norm_small_eps_half_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/hardmax_axis_neg_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/hardmax_axis_neg_ver11_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/hardmax_axis_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/hardmax_axis_ver11_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/hardmax_default_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/hardmax_default_ver11_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/hardsigmoid_default_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/hardsigmoid_double_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/hardsigmoid_half_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/hardswish_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/if_else_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/if_else_test_inlined.cpp.o CMakeFiles/test_onnx_test.dir/parse/if_literal_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/if_param_excp1_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/if_param_excp_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/if_param_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/if_pl_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/if_then_else_multi_output_shapes_inlined_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/if_then_else_multi_output_shapes_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/if_then_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/if_then_test_inlined.cpp.o CMakeFiles/test_onnx_test.dir/parse/if_tuple_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/imagescaler_half_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/imagescaler_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/implicit_add_bcast_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/implicit_pow_bcast_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/implicit_sub_bcast_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/initializer_not_an_input.cpp.o CMakeFiles/test_onnx_test.dir/parse/instance_norm_dyn_batch_half_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/instance_norm_dyn_batch_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/instance_norm_half_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/instance_norm_invalid_type_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/instance_norm_nonbroadcastable_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/instance_norm_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/instance_norm_type_mismatch_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/isinf_double_pos_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/isinf_half_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/isinf_neg_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/isinf_no_detect_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/isnan_float_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/isnan_half_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/layer_norm_2d_axis_one_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/layer_norm_2d_axis_zero_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/layer_norm_3d_half_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/layer_norm_3d_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/layer_norm_4d_half_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/layer_norm_4d_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/layer_norm_invalid_axis_error_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/layer_norm_invalid_input_count_error_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/layer_norm_invalid_minus_axis_error_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/layer_norm_small_eps_half_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/layer_norm_without_bias_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/leaky_relu_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/less_bool_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/less_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/lessorequal_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/log_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/logical_and_bcast_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/logical_or_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/logical_xor_bcast_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/logsoftmax_nonstd_input_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/logsoftmax_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/loop_default_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/loop_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/lpnormalization_axis_error_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/lpnormalization_default_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/lpnormalization_p_error_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/lppool_l1_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/lppool_l2_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/lrn_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/main.cpp.o CMakeFiles/test_onnx_test.dir/parse/matmul_bmbm_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/matmul_bmv_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/matmul_dyn_broadcast_error.cpp.o CMakeFiles/test_onnx_test.dir/parse/matmul_dyn_mm_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/matmul_dyn_mv_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/matmul_dyn_vm_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/matmul_dyn_vv_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/matmul_mv_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/matmul_vbm_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/matmul_vm_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/matmul_vv_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/matmulinteger_dual_zp_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/matmulinteger_dyn_error.cpp.o CMakeFiles/test_onnx_test.dir/parse/matmulinteger_invalid_type_error.cpp.o CMakeFiles/test_onnx_test.dir/parse/matmulinteger_one_zp_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/matmulinteger_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/matmulinteger_zp_mismatch_error_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/max_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/maxpool_dilate_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/maxpool_notset_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/maxpool_same_upper_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/mean_fp16_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/mean_integral_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/mean_invalid_broadcast_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/mean_single_input_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/min_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/mod_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/mod_test_different_dtypes.cpp.o CMakeFiles/test_onnx_test.dir/parse/mod_test_fmod.cpp.o CMakeFiles/test_onnx_test.dir/parse/mod_test_fmod_different_dtypes.cpp.o CMakeFiles/test_onnx_test.dir/parse/mod_test_fmod_half.cpp.o CMakeFiles/test_onnx_test.dir/parse/mod_test_half.cpp.o CMakeFiles/test_onnx_test.dir/parse/multinomial_autoseed_dyn_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/multinomial_dtype_error_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/multinomial_dyn_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/multinomial_generated_seed_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/multinomial_int64_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/multinomial_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/mvn_axes_rank_too_big_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/mvn_axes_rank_too_small_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/mvn_default_axes_rank_too_big_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/mvn_default_axes_rank_too_small_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/mvn_default_axes_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/mvn_rank_3_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/neg_dynamic_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/neg_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/nms_dynamic_batch_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/nms_dynamic_boxes_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/nms_dynamic_classes_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/nms_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/nms_use_dyn_output_false_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/no_pad_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/nonzero_dynamic_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/nonzero_int_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/nonzero_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/not_bool_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/not_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/onehot_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/pad_3arg_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/pad_4arg_axes_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/pad_4arg_invalid_axes_error_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/pad_4arg_neg_axes_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/pad_asym_invalid_pads_error_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/pad_asym_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/pad_attr_dyn_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/pad_cnst_dyn_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/pad_dyn_reflect_error.cpp.o CMakeFiles/test_onnx_test.dir/parse/pad_reflect_multiaxis_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/pad_reflect_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/pad_reflect_with_axes_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/pad_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/pow_fp32_i64_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/pow_i64_fp32_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/pow_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/prefix_scan_sum_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/prelu_brcst_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/qlinearadd_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/qlinearaveragepool_notset_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/qlinearconcat_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/qlinearconv_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/qlinearglobalavgpool_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/qlinearleakyrelu_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/qlinearmatmul_1D_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/qlinearmatmul_2D_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/qlinearmul_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/qlinearsigmoid_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/quantizelinear_axis_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/quantizelinear_int32_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/quantizelinear_neg_axis_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/quantizelinear_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/quantizelinear_zero_point_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/randomnormal_dtype_error_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/randomnormal_generated_seed_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/randomnormal_shape_error_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/randomnormal_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/randomnormallike_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/randomnormallike_type_error_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/randomuniform_dtype_error_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/randomuniform_generated_seed_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/randomuniform_shape_error_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/randomuniform_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/randomuniformlike_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/randomuniformlike_type_error_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/range_float_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/range_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/recip_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/reduce_log_sum_exp_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/reduce_log_sum_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/reducel1_dyn_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/reducel1_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/reducel2_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/reducemax_dyn_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/reducemax_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/reducemean_keepdims_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/reducemean_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/reducemin_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/reduceprod_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/reducesum_empty_axes_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/reducesum_keepdims_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/reducesum_multiaxis_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/reducesum_noop_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/reducesum_square_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/reducesum_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/reshape_non_standard_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/reshape_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/reshape_variable_input_dyn_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/reshape_variable_input_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/resize_downsample_c_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/resize_downsample_f_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/resize_downsample_linear_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/resize_nonstd_input_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/resize_outsize_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/resize_upsample_linear_ac_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/resize_upsample_linear_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/resize_upsample_pc_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/resize_upsample_pf_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/reversesequence_batch_axis_err_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/reversesequence_batch_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/reversesequence_rank_err_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/reversesequence_same_axis_err_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/reversesequence_sequence_lens_shape_err_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/reversesequence_time_axis_err_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/reversesequence_time_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/roialign_default_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/roialign_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/round_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/scatter_add_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/scatter_invalid_reduction_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/scatter_max_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/scatter_min_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/scatter_mul_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/scatter_none_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/scatternd_dyn_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/scatternd_invalid_reduction_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/selu_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/shape_dyn_test0.cpp.o CMakeFiles/test_onnx_test.dir/parse/shape_dyn_test1.cpp.o CMakeFiles/test_onnx_test.dir/parse/shape_dyn_test2.cpp.o CMakeFiles/test_onnx_test.dir/parse/shape_dyn_test3.cpp.o CMakeFiles/test_onnx_test.dir/parse/shape_end_less_start_error.cpp.o CMakeFiles/test_onnx_test.dir/parse/shape_end_oob_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/shape_gather_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/shape_start_oob_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/shape_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/shrink_hard_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/shrink_int8_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/sign_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/sin_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/sinh_dynamic_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/sinh_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/size_float_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/size_half_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/size_int_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/slice_3arg_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/slice_5arg_reverse_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/slice_5arg_step_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/slice_5arg_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/slice_constant_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/slice_dyn_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/slice_max_end_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/slice_reverse_dyn_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/slice_step_dyn_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/slice_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/slice_var_input_default_steps.cpp.o CMakeFiles/test_onnx_test.dir/parse/slice_var_input_dyn0.cpp.o CMakeFiles/test_onnx_test.dir/parse/slice_var_input_dyn1.cpp.o CMakeFiles/test_onnx_test.dir/parse/slice_var_input_static0.cpp.o CMakeFiles/test_onnx_test.dir/parse/slice_var_input_static1.cpp.o CMakeFiles/test_onnx_test.dir/parse/slice_var_input_steps_error.cpp.o CMakeFiles/test_onnx_test.dir/parse/softmax_dyn_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/softmax_nonstd_input_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/softmax_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/softplus_nd_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/softplus_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/softsign_nd_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/softsign_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/spacetodepth_invalid_blocksize_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/spacetodepth_nondivisibility_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/spacetodepth_simple_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/spacetodepth_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/split_minus_axis_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/split_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/split_test_default.cpp.o CMakeFiles/test_onnx_test.dir/parse/split_test_invalid_num_outputs.cpp.o CMakeFiles/test_onnx_test.dir/parse/split_test_invalid_split.cpp.o CMakeFiles/test_onnx_test.dir/parse/split_test_no_attribute.cpp.o CMakeFiles/test_onnx_test.dir/parse/split_test_no_attribute_invalid_input_split.cpp.o CMakeFiles/test_onnx_test.dir/parse/split_test_no_attribute_invalid_split.cpp.o CMakeFiles/test_onnx_test.dir/parse/split_test_uneven.cpp.o CMakeFiles/test_onnx_test.dir/parse/split_test_uneven_num_outputs.cpp.o CMakeFiles/test_onnx_test.dir/parse/sqrt_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/squeeze_axes_input_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/squeeze_empty_axes_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/squeeze_unsqueeze_dyn_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/squeeze_unsqueeze_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/sub_bcast_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/sub_scalar_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/sum_int_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/sum_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/sum_type_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/tan_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/tanh_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/thresholdedrelu_default_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/thresholdedrelu_int_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/thresholdedrelu_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/tile_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/tile_test_3x2.cpp.o CMakeFiles/test_onnx_test.dir/parse/topk_attrk_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/topk_neg_axis_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/topk_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/transpose_default_perm_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/transpose_dyn_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/transpose_gather_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/transpose_invalid_perm_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/transpose_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/undefined_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/unique_dynamic_sorted_3D_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/unique_dynamic_sorted_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/unique_sorted_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/unique_unsorted_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/unknown_aten_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/unknown_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/upsample_linear_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/upsample_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/upsample_ver7_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/variable_batch_leq_zero_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/variable_batch_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/where_dyn_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/where_mixed_test.cpp.o CMakeFiles/test_onnx_test.dir/parse/where_test.cpp.o -o ../../bin/test_onnx_test  -Wl,-rpath,/home/chris/rocm_sdk_builder/builddir/035_AMDMIGraphX/lib ../../lib/libmigraphx_onnx.so.2009000.0.60101 ../../lib/libmigraphx_ref.so.2009000.0.60101 ../../lib/libmigraphx.so.2009000.0.60101 
make[2]: Leaving directory '/home/chris/rocm_sdk_builder/builddir/035_AMDMIGraphX'
[ 61%] Built target test_onnx_test
make[1]: Leaving directory '/home/chris/rocm_sdk_builder/builddir/035_AMDMIGraphX'
make: *** [Makefile:166: all] Error 2
build failed: AMDMIGraphX

build failed
Crizle commented 1 month ago

Also just realised, I've now modified the src_projects/pytorch/build_pytorch_rocm.sh with the CMAKE_CXX_FLAGS="$CMAKE_CXX_FLAGS -Wno-error=maybe-uninitialized" added to build_pytorch_rocm.sh. Build results are now as so:

[ 63%] Linking CXX shared module ../../lib/migraphx.cpython-39-x86_64-linux-gnu.so cd /home/chris/rocm_sdk_builder/builddir/035_AMDMIGraphX/src/py && /usr/bin/cmake -E cmake_link_script CMakeFiles/migraphx_pybind_3.9.dir/link.txt --verbose=1 /opt/rocm_sdk_611/bin/clang++ -fPIC -O3 -DNDEBUG -L/opt/rocm_sdk_611/lib64 -L/opt/rocm_sdk_611/lib -L/opt/rocm_sdk_611/hsa/lib -L/opt/rocm_sdk_611/rocblas/lib -L/opt/rocm_sdk_611/hcc/lib -shared -o ../../lib/migraphx.cpython-39-x86_64-linux-gnu.so CMakeFiles/migraphx_pybind_3.9.dir/migraphx_py.cpp.o -Wl,-rpath,/home/chris/rocm_sdk_builder/builddir/035_AMDMIGraphX/lib: ../../lib/libmigraphx_tf.so.2009000.0.60101 ../../lib/libmigraphx_onnx.so.2009000.0.60101 ../../lib/libmigraphx_ref.so.2009000.0.60101 ../../lib/libmigraphx_gpu.so.2009000.0.60101 /opt/rocm_sdk_611/lib64/libhiprtc.so.6.1.40092-a8157d309 -ldl /opt/rocm_sdk_611/lib64/libMIOpen.so.1.0.60101 /opt/rocm_sdk_611/lib64/librocblas.so.4.1.60101 /opt/rocm_sdk_611/lib64/libamdhip64.so.6.1.40092-a8157d309 /opt/rocm_sdk_611/lib/clang/17/lib/linux/libclang_rt.builtins-x86_64.a ../../lib/libmigraphx.so.2009000.0.60101 -Wl,-rpath-link,/home/chris/rocm_sdk_builder/builddir/035_AMDMIGraphX/lib cd /home/chris/rocm_sdk_builder/builddir/035_AMDMIGraphX/src/py && /usr/bin/strip /home/chris/rocm_sdk_builder/builddir/035_AMDMIGraphX/lib/migraphx.cpython-39-x86_64-linux-gnu.so make[2]: Leaving directory '/home/chris/rocm_sdk_builder/builddir/035_AMDMIGraphX' [ 63%] Built target migraphx_pybind_3.9 make[1]: Leaving directory '/home/chris/rocm_sdk_builder/builddir/035_AMDMIGraphX' make: *** [Makefile:166: all] Error 2 build failed: AMDMIGraphX

build failed

Crizle commented 1 month ago

Just done a system update on Fedora 40 and retried the build but not sure if any difference, just 1% further?

[ 64%] Linking CXX executable ../bin/header_src_include_migraphx_generate_hpp
cd /home/chris/rocm_sdk_builder/builddir/035_AMDMIGraphX/test && /usr/bin/cmake -E cmake_link_script CMakeFiles/header_src_include_migraphx_generate_hpp.dir/link.txt --verbose=1
/opt/rocm_sdk_611/bin/clang++ -O3 -DNDEBUG -L/opt/rocm_sdk_611/lib64 -L/opt/rocm_sdk_611/lib -L/opt/rocm_sdk_611/hsa/lib -L/opt/rocm_sdk_611/rocblas/lib -L/opt/rocm_sdk_611/hcc/lib "CMakeFiles/header_src_include_migraphx_generate_hpp.dir/header-main-include-header_src_include_migraphx_generate_hpp.cpp.o" "CMakeFiles/header_src_include_migraphx_generate_hpp.dir/header-static-include-header_src_include_migraphx_generate_hpp.cpp.o" -o ../bin/header_src_include_migraphx_generate_hpp  -Wl,-rpath,/home/chris/rocm_sdk_builder/builddir/035_AMDMIGraphX/lib: ../lib/libmigraphx_onnx.so.2009000.0.60101 ../lib/libmigraphx_tf.so.2009000.0.60101 ../lib/libmigraphx_ref.so.2009000.0.60101 ../lib/libmigraphx_gpu.so.2009000.0.60101 /opt/rocm_sdk_611/lib64/libhiprtc.so.6.1.40092-a8157d309 -ldl /opt/rocm_sdk_611/lib64/libMIOpen.so.1.0.60101 /opt/rocm_sdk_611/lib64/librocblas.so.4.1.60101 /opt/rocm_sdk_611/lib64/libamdhip64.so.6.1.40092-a8157d309 /opt/rocm_sdk_611/lib/clang/17/lib/linux/libclang_rt.builtins-x86_64.a ../lib/libmigraphx.so.2009000.0.60101 -Wl,-rpath-link,/home/chris/rocm_sdk_builder/builddir/035_AMDMIGraphX/lib 
make[2]: Leaving directory '/home/chris/rocm_sdk_builder/builddir/035_AMDMIGraphX'
[ 64%] Built target header_src_include_migraphx_fp_to_double_hpp
make[2]: Leaving directory '/home/chris/rocm_sdk_builder/builddir/035_AMDMIGraphX'
[ 64%] Built target header_src_include_migraphx_generate_hpp
make[1]: Leaving directory '/home/chris/rocm_sdk_builder/builddir/035_AMDMIGraphX'
make: *** [Makefile:166: all] Error 2
build failed: AMDMIGraphX

build failed
lamikr commented 1 month ago

@Crizle I pushed yesterday one patch for AMDMIGraphX, you propably need to get that one also in use by running.

# git pull
# ./babs.sh -co
# ./basb.sh -ap
# rm -rf builddir/035_AMDMIGraphX
# ./babs.sh -b
lamikr commented 1 month ago

If that helps you to get AMDMIGraphX build, then it should build couple of more projects just fine before pytorch. (You can check the build order from files in binfo directory)

In pytorch there are 2 problems. 1) The one for which I gave you the fix for testing 2) Second problem with std::clamp call.

I tracked the second bug to IndexKernel.cu that is then used to generate IndexKernel.hip file. There is a call to std::clamp which causes assert error on file /usr/include/c++/14/bits/stl_algo.h

At the moment I have no glue for real fix yet. I tested that by after removing the std::clamp call the build succeeeds but that is of course not the real fix as it makes one pytorch kernel behave incorrectly. It may be that with the older gcc versions that whole assert check is not enabled but have not figured out what would be the right fix.

Then there were 2 problems for onnxruntime for which I have fix. After that there are good changes your build could work. Final code to build is DeepSpeed after onnxruntime is the DeepSpeed for now. And even without that you should have most of the functionality.

lamikr commented 1 month ago

So all other errors except this should now have been fixed for Fedora 40. This one is triggered by clamp callin IndexKernel.cu/hip: qvalue = std::clamp(qvalue, qmin, qmax);

In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/algorithm:61:
/usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/stl_algo.h:3625:7: error: reference to __host__ function '__glibcxx_assert_fail' in __host__ __device__ function
 3625 |       __glibcxx_assert(!(__hi < __lo));
      |       ^
/usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/x86_64-redhat-linux/bits/c++config.h:2453:7: note: expanded from macro '__glibcxx_assert'
 2453 |         std::__glibcxx_assert_fail();                                   \
      |              ^
/home/lamikr/own/rocm/src/sdk/rocm_sdk_builder_611/src_projects/pytorch/aten/src/ATen/native/hip/IndexKernel.hip:254:21: note: called by 'operator()'
  254 |       qvalue = std::clamp(qvalue, qmin, qmax);
      |                     ^
/home/lamikr/own/rocm/src/sdk/rocm_sdk_builder_611/src_projects/pytorch/aten/src/ATen/native/hip/IndexKernel.hip:101:5: note: called by 'operator()'
  101 |     f(out_data, in_data, offset);
      |     ^
/home/lamikr/own/rocm/src/sdk/rocm_sdk_builder_611/src_projects/pytorch/aten/src/ATen/native/hip/IndexKernel.hip:36:7: note: called by 'index_elementwise_kernel<128, 4, (lambda at /home/lamikr/own/rocm/src/sdk/rocm_sdk_builder_611/src_projects/pytorch/aten/src/ATen/native/hip/IndexKernel.hip:85:62)>'
   36 |       f(idx);
      |       ^
/usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/x86_64-redhat-linux/bits/c++config.h:2446:3: note: '__glibcxx_assert_fail' declared here
 2446 |   __glibcxx_assert_fail()
      |   ^
1 warning and 5 errors generated when compiling for gfx1035.
CMake Error at torch_hip_generated_IndexKernel.hip.o.cmake:200 (message):
  Error generating file
  /home/lamikr/own/rocm/src/sdk/rocm_sdk_builder_611/src_projects/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/./torch_hip_generated_IndexKernel.hip.o

And problematic cuda code

void index_put_kernel_quantized_cuda(TensorIterator& iter, const IntArrayRef index_size, const IntArrayRef index_stride, const bool accumulate, const double scale, const int zero_point) {
  TORCH_CHECK(!accumulate, "index_put does not support accumulate=true");
  AT_DISPATCH_QINT_AND_SUB_BYTE_TYPES(iter.dtype(), "index_put", [&] {
    constexpr int64_t qmin = std::numeric_limits<typename scalar_t::underlying>::min();
    constexpr int64_t qmax = std::numeric_limits<typename scalar_t::underlying>::max();
    const float inv_scale = 1.0f / static_cast<float>(scale);

    gpu_index_kernel(iter, index_size, index_stride, [inv_scale, zero_point, qmin, qmax]C10_DEVICE(char* const out_data, const char* const in_data, const int64_t offset) {
      int64_t qvalue = static_cast<int64_t>(zero_point + nearbyintf(*(float*)in_data * inv_scale));
      qvalue = std::clamp(qvalue, qmin, qmax);
      *(scalar_t*)(out_data + offset) = static_cast<scalar_t>(qvalue);
    });
  });
}
lamikr commented 1 month ago

Changing the code in this way should be ok implementation that builds ok until root cause is found.

void index_put_kernel_quantized_cuda(TensorIterator& iter, const IntArrayRef index_size, const IntArrayRef index_stride, const bool accumulate, const double scale, const int zero_point) {
  TORCH_CHECK(!accumulate, "index_put does not support accumulate=true");
  AT_DISPATCH_QINT_AND_SUB_BYTE_TYPES(iter.dtype(), "index_put", [&] {
    constexpr int64_t qmin = std::numeric_limits<typename scalar_t::underlying>::min();
    constexpr int64_t qmax = std::numeric_limits<typename scalar_t::underlying>::max();
    const float inv_scale = 1.0f / static_cast<float>(scale);

    gpu_index_kernel(iter, index_size, index_stride, [inv_scale, zero_point, qmin, qmax]C10_DEVICE(char* const out_data, const char* const in_data, const int64_t offset) {
      int64_t qvalue = static_cast<int64_t>(zero_point + nearbyintf(*(float*)in_data * inv_scale));
      int64_t new_max = std::max<int64_t>(qmin, qvalue);
      qvalue = std::min<int64_t>(qmax, new_max);
      //qvalue = std::clamp(qvalue, qmin, qmax);
      *(scalar_t*)(out_data + offset) = static_cast<scalar_t>(qvalue);
    });
  });
}

I created bug to pytorch github: https://github.com/pytorch/pytorch/issues/127666

Crizle commented 1 month ago

void index_put_kernel_quantized_cuda(TensorIterator& iter, const IntArrayRef index_size, const IntArrayRef index_stride, const bool accumulate, const double scale, const int zero_point) { TORCH_CHECK(!accumulate, "index_put does not support accumulate=true"); AT_DISPATCH_QINT_AND_SUB_BYTE_TYPES(iter.dtype(), "index_put", [&] { constexpr int64_t qmin = std::numeric_limits::min(); constexpr int64_t qmax = std::numeric_limits::max(); const float inv_scale = 1.0f / static_cast(scale);

gpu_index_kernel(iter, index_size, index_stride, [inv_scale, zero_point, qmin, qmax]C10_DEVICE(char* const out_data, const char* const in_data, const int64_t offset) {
  int64_t qvalue = static_cast<int64_t>(zero_point + nearbyintf(*(float*)in_data * inv_scale));
  int64_t new_max = std::max<int64_t>(qmin, qvalue);
  qvalue = std::min<int64_t>(qmax, new_max);
  //qvalue = std::clamp(qvalue, qmin, qmax);
  *(scalar_t*)(out_data + offset) = static_cast<scalar_t>(qvalue);
});

}); }

After applying this change, and doing the build this is my output, which I guess I will need to wait on pytorch bug report.

/home/chris/rocm_sdk_builder/src_projects/pytorch/aten/src/ATen/native/hip/ScanUtils.cuh:453:16: note: in instantiation of function template specialization 'at::cuda::cub::inclusive_scan<const c10::BFloat16 *, c10::BFloat16 *, (lambda at /home/chris/rocm_sdk_builder/src_projects/pytorch/aten/src/ATen/native/hip/LogcumsumexpKernel.hip:112:3), 1073741824>' requested here
  453 |     cuda::cub::inclusive_scan(self_->const_data_ptr<scalar_t>(), result.mutable_data_ptr<scalar_t>(), binary_op, self.numel());
      |                ^
/home/chris/rocm_sdk_builder/src_projects/pytorch/aten/src/ATen/native/hip/LogcumsumexpKernel.hip:121:9: note: in instantiation of function template specialization 'at::native::scan_dim<c10::BFloat16, (lambda at /home/chris/rocm_sdk_builder/src_projects/pytorch/aten/src/ATen/native/hip/LogcumsumexpKernel.hip:112:3)>' requested here
  121 |         scan_dim<scalar_t>(self, result, dim, init, log_add_exp);
      |         ^
/home/chris/rocm_sdk_builder/src_projects/pytorch/aten/src/ATen/native/hip/LogcumsumexpKernel.hip:110:24: note: expanded from macro '_LCME_DISPATCH'
  110 | #define _LCME_DISPATCH AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND2
      |                        ^
In file included from /home/chris/rocm_sdk_builder/src_projects/pytorch/aten/src/ATen/native/hip/LogcumsumexpKernel.hip:8:
In file included from /home/chris/rocm_sdk_builder/src_projects/pytorch/aten/src/ATen/native/hip/ScanUtils.cuh:6:
/home/chris/rocm_sdk_builder/src_projects/pytorch/aten/src/ATen/hip/cub.cuh:241:30: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
  241 |   CUB_WRAPPER(NO_ROCM(detail)::hipcub::DeviceScan::InclusiveScan,
      |                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/chris/rocm_sdk_builder/src_projects/pytorch/aten/src/ATen/hip/cub.cuh:44:3: note: expanded from macro 'CUB_WRAPPER'
   44 |   func(temp_storage.get(), temp_storage_bytes, __VA_ARGS__);              \
      |   ^~~~
12 warnings generated when compiling for host.
ninja: build stopped: subcommand failed.
build failed: pytorch
  error in build cmd: ./build_pytorch_rocm.sh /opt/rocm_sdk_611 60101

build failed
daniandtheweb commented 1 month ago

Try to delete the src_projects folder of pytorch and execute the init command again, after that apply the patch and the build should work fine, had a similar issue on Arch.

Crizle commented 1 month ago

Try to delete the src_projects folder of pytorch and execute the init command again, after that apply the patch and the build should work fine, had a similar issue on Arch.

Thank you, that has built pytorch and the build is now continuing.

lamikr commented 4 weeks ago

Thanks for confirming! Did the build finish and are the examples in

/opt/rocm_sdk_611/docs/examples working? For example in opencl and pytorch directories?

Crizle commented 4 weeks ago

No problem! Anyway, I think as you mentioned about Deepseed above - all has been built up to that. So seems all great progress! I'll get around to trying the test a bit later or tomorrow and let you know how it went.

Here is the last build status:

build ok: onnxruntime

/home/chris/rocm_sdk_builder/builddir/040_01_onnxruntime_rocm_training
installing onnxruntime
./build/build.sh: line 312: [: cd: binary operator expected
custom install
onnxruntime, install command 0
cd /home/chris/rocm_sdk_builder/src_projects/onnxruntime
install cmd ok: onnxruntime
onnxruntime, install command 1
./install_onnxruntime_rocm_training.sh
Processing ./build/Linux/Release/dist/onnxruntime_training-1.17.3+cpu-cp39-cp39-linux_x86_64.whl
Collecting cerberus (from onnxruntime-training==1.17.3+cpu)
  Downloading Cerberus-1.3.5-py3-none-any.whl.metadata (6.0 kB)
Collecting flatbuffers (from onnxruntime-training==1.17.3+cpu)
  Downloading flatbuffers-24.3.25-py2.py3-none-any.whl.metadata (850 bytes)
Collecting h5py (from onnxruntime-training==1.17.3+cpu)
  Downloading h5py-3.11.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.5 kB)
Requirement already satisfied: numpy>=1.16.6 in /opt/rocm_sdk_611/lib/python3.9/site-packages (from onnxruntime-training==1.17.3+cpu) (1.26.4)
Collecting onnx (from onnxruntime-training==1.17.3+cpu)
  Downloading onnx-1.16.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (16 kB)
Requirement already satisfied: packaging in /opt/rocm_sdk_611/lib/python3.9/site-packages (from onnxruntime-training==1.17.3+cpu) (24.0)
Requirement already satisfied: protobuf in /opt/rocm_sdk_611/lib/python3.9/site-packages (from onnxruntime-training==1.17.3+cpu) (5.27.0)
Requirement already satisfied: sympy in /opt/rocm_sdk_611/lib/python3.9/site-packages/sympy-1.12.1-py3.9.egg (from onnxruntime-training==1.17.3+cpu) (1.12.1)
Collecting setuptools>=61.0.0 (from onnxruntime-training==1.17.3+cpu)
  Using cached setuptools-70.0.0-py3-none-any.whl.metadata (5.9 kB)
Requirement already satisfied: mpmath<1.4.0,>=1.1.0 in /opt/rocm_sdk_611/lib/python3.9/site-packages/mpmath-1.3.0-py3.9.egg (from sympy->onnxruntime-training==1.17.3+cpu) (1.3.0)
Using cached setuptools-70.0.0-py3-none-any.whl (863 kB)
Downloading Cerberus-1.3.5-py3-none-any.whl (30 kB)
Downloading flatbuffers-24.3.25-py2.py3-none-any.whl (26 kB)
Downloading h5py-3.11.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.3/5.3 MB 17.7 MB/s eta 0:00:00
Downloading onnx-1.16.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.9 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.9/15.9 MB 27.4 MB/s eta 0:00:00
Installing collected packages: flatbuffers, cerberus, setuptools, onnx, h5py, onnxruntime-training
  Attempting uninstall: setuptools
    Found existing installation: setuptools 58.1.0
    Uninstalling setuptools-58.1.0:
      Successfully uninstalled setuptools-58.1.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
barectf 3.1.2 requires jsonschema<4.0,>=3.2, but you have jsonschema 4.22.0 which is incompatible.
Successfully installed cerberus-1.3.5 flatbuffers-24.3.25 h5py-3.11.0 onnx-1.16.1 onnxruntime-training-1.17.3+cpu setuptools-70.0.0
install cmd ok: onnxruntime
install ok: onnxruntime

/home/chris/rocm_sdk_builder/builddir/040_01_onnxruntime_rocm_training
post installing onnxruntime
no post install commands
post install ok: onnxruntime
LIST_BINFO_FILE_FULLNAME[79]: /home/chris/rocm_sdk_builder/binfo/040_02_onnxruntime_deepspeed.binfo

---------------------------
BINFO_APP_NAME: DeepSpeed
BINFO FILE: /home/chris/rocm_sdk_builder/binfo/040_02_onnxruntime_deepspeed.binfo
BINFO_APP_SRC_SUBDIR_BASENAME: 
BINFO_APP_SRC_TOPDIR_BASENAME: DeepSpeed
BINFO_APP_SRC_DIR: /home/chris/rocm_sdk_builder/src_projects/DeepSpeed
BINFO_APP_BUILD_DIR: /home/chris/rocm_sdk_builder/builddir/040_02_onnxruntime_deepspeed
HIP_PATH: /opt/rocm_sdk_611
INSTALL_DIR: /opt/rocm_sdk_611
HIP_PLATFORM: amd
TASK_RESULT_FILE_INSTALL: /home/chris/rocm_sdk_builder/builddir/040_02_onnxruntime_deepspeed/.result_install
---------------------------

/home/chris/rocm_sdk_builder/builddir/040_02_onnxruntime_deepspeed
pre-configuring DeepSpeed
no pre-configuration commands
pre-configuration ok: DeepSpeed

SHELL=/bin/bash
SESSION_MANAGER=local/unix:@/tmp/.ICE-unix/3722,unix/unix:/tmp/.ICE-unix/3722
SDK_CXX_COMPILER_HIP_CLANG=/opt/rocm_sdk_611/bin/clang++
CCACHE_TEMPDIR=/home/chris/.ccache
COLORTERM=truecolor
CMAKE_BUILD_TYPE_RELWITHDEBINFO=RelWithDebInfo
INSTALL_DIR_PREFIX_SDK_ROOT=/opt/rocm_sdk_611
HISTCONTROL=ignoredups
XDG_MENU_PREFIX=gnome-
CPPFLAGS_DEFAULT=-I/opt/rocm_sdk_611/include -I/opt/rocm_sdk_611/hsa/include -I/opt/rocm_sdk_611/rocm_smi/include -I/opt/rocm_sdk_611/rocblas/include
PKG_CONFIG_PATH={INSTALL_DIR_PREFIX_SDK_ROOT}/lib64/pkgconfig:{INSTALL_DIR_PREFIX_SDK_ROOT}/lib/pkgconfig:{INSTALL_DIR_PREFIX_SDK_ROOT}/share/pkgconfig
HCC_HOME=/opt/rocm_sdk_611/hcc
HOSTNAME=fedora.fritz.box
HISTSIZE=1000
UPSTREAM_REPO_VERSION_TAG_DEFAULT=rocm-6.1.1
SSH_AUTH_SOCK=/run/user/1000/keyring/ssh
MEMORY_PRESSURE_WRITE=c29tZSAyMDAwMDAgMjAwMDAwMAA=
HIPCC_VERBOSE=7
XMODIFIERS=@im=ibus
APP_CMAKE_CFG_FLAGS_DEFAULT=-DCMAKE_INSTALL_LIBDIR=lib64
DESKTOP_SESSION=gnome
BUILD_CPU_COUNT_TENSILE_SAFE=4
ROCM_MINOR_VERSION=1
SDK_C_COMPILER_HIPCC=/opt/rocm_sdk_611/bin/hipcc
EDITOR=/usr/bin/nano
ROCM_MAJOR_VERSION=6
PWD=/home/chris/rocm_sdk_builder/builddir/040_02_onnxruntime_deepspeed
LOGNAME=chris
XDG_SESSION_DESKTOP=gnome
XDG_SESSION_TYPE=wayland
CCACHE_DIR=/home/chris/.ccache
CMAKE_BUILD_TYPE_DEFAULT=Release
SYSTEMD_EXEC_PID=9547
ROCM_VERSION_NMBR=60101
BUILD_CPU_COUNT_DEFAULT=8
XAUTHORITY=/run/user/1000/.mutter-Xwaylandauth.XQGSO2
ROCM_VERSION_STR_ZEROED_NO_DOTS=60101
CMAKE_BUILD_TYPE_RELEASE=Release
SDL_VIDEO_MINIMIZE_ON_FOCUS_LOSS=0
GDM_LANG=en_GB.UTF-8
LDFLAGS=-L/opt/rocm_sdk_611/lib64 -L/opt/rocm_sdk_611/lib -L/opt/rocm_sdk_611/hsa/lib -L/opt/rocm_sdk_611/rocblas/lib -L/opt/rocm_sdk_611/hcc/lib
HOME=/home/chris
USERNAME=chris
LANG=en_GB.UTF-8
XDG_CURRENT_DESKTOP=GNOME
BUILD_CPU_COUNT=8
ROCM_LIBPATCH_VERSION=60101
MEMORY_PRESSURE_WATCH=/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/dbus-:1.2-org.gnome.Nautilus@1.service/memory.pressure
VTE_VERSION=7601
WAYLAND_DISPLAY=wayland-0
HIP_PATH=/opt/rocm_sdk_611
BUILD_CPU_COUNT_MIN=1
python=python
GNOME_TERMINAL_SCREEN=/org/gnome/Terminal/screen/a204dded_9b07_4e19_ae8d_bb4256a8fdf3
CPPFLAGS=-I/opt/rocm_sdk_611/include -I/opt/rocm_sdk_611/hsa/include -I/opt/rocm_sdk_611/rocm_smi/include -I/opt/rocm_sdk_611/rocblas/include
SDK_C_COMPILER_DEFAULT=/opt/rocm_sdk_611/bin/hipcc
STEAM_FRAME_FORCE_CLOSE=1
INSTALL_DIR_PREFIX_HIP_LLVM=/opt/rocm_sdk_611
GNOME_SETUP_DISPLAY=:1
HIP_PLATFORM=amd
HIP_PLATFORM_DEFAULT=amd
XDG_SESSION_CLASS=user
INSTALL_DIR_PREFIX_C_COMPILER=/opt/rocm_sdk_611
TERM=xterm-256color
BABS_VERSION=2024_05_25_01
BUILD_CPU_COUNT_MAX=12
LESSOPEN=||/usr/bin/lesspipe.sh %s
ROCM_DIR=/opt/rocm_sdk_611
CMAKE_BUILD_TYPE_DEBUG=Debug
USER=chris
SDK_PLATFORM_NAME_HIPCLANG=clang
ROCM_PATCH_VERSION=1
GNOME_TERMINAL_SERVICE=:1.273
DEVICE_LIB_PATH=/opt/rocm_sdk_611/amdgcn/bitcode
BUILD_CPU_COUNT_HALF=6
ROCBLAS_HOME=/opt/rocm_sdk_611/rocblas
INSTALL_DIR_PREFIX_HIPCC=/opt/rocm_sdk_611
SDK_CXX_COMPILER_DEFAULT=/opt/rocm_sdk_611/bin/hipcc
DISPLAY=:0
SHLVL=3
QT_IM_MODULE=ibus
HIP_PATH_DEFAULT=/opt/rocm_sdk_611
ROCM_SDK_VERSION_INFO=rocm-6.1.1
ROCM_PATH=/opt/rocm_sdk_611
PATCH_FILE_ROOT_DIR=/home/chris/rocm_sdk_builder/patches/rocm-6.1.1
LD_LIBRARY_PATH=/opt/rocm_sdk_611/hcc/lib:/opt/rocm_sdk_611/rocblas/lib:/opt/rocm_sdk_611/hsa/lib:/opt/rocm_sdk_611/lib:/opt/rocm_sdk_611/lib64:/lib64:/opt/rocm_sdk_611/lib64:/opt/rocm_sdk_611/lib:/opt/rocm_sdk_611/hsa/lib
XDG_RUNTIME_DIR=/run/user/1000
HCC_PATH=/opt/rocm_sdk_611/hcc/bin
DEBUGINFOD_URLS=https://debuginfod.fedoraproject.org/ 
BUILD_SCRIPT_ROOT_DIR=/home/chris/rocm_sdk_builder/build
SDK_SRC_ROOT_DIR=/home/chris/rocm_sdk_builder/src_projects
CPACK_RPM_PACKAGE_RELEASE=01
SDK_CXX_COMPILER_HIPCC=/opt/rocm_sdk_611/bin/hipcc
XDG_DATA_DIRS=/home/chris/.local/share/flatpak/exports/share:/var/lib/flatpak/exports/share:/usr/local/share/:/usr/share/
BUILD_ROOT_DIR=/home/chris/rocm_sdk_builder/builddir
SDK_C_COMPILER_HIP_CLANG=/opt/rocm_sdk_611/bin/clang
PATH=/opt/rocm_sdk_611/bin:/opt/rocm_sdk_611/hcc/bin:/opt/rocm_sdk_611/bin:/opt/rocm_sdk_611/bin:/opt/rocm_sdk_611/hcc/bin:/opt/rocm_sdk_611/bin:/home/chris/.cargo/bin:/home/chris/miniconda3/bin:/home/chris/.local/bin:/home/chris/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin
GDMSESSION=gnome
CFLAGS=-I/opt/rocm_sdk_611/include -I/opt/rocm_sdk_611/hsa/include -I/opt/rocm_sdk_611/rocm_smi/include -I/opt/rocm_sdk_611/rocblas/include
SDK_PLATFORM_NAME_HIPCC=amd
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus
MAIL=/var/spool/mail/chris
INSTALL_DIR_PREFIX_HIP_CLANG=/opt/rocm_sdk_611
ROCM_VERSION_STR=6.1.1
OLDPWD=/home/chris/rocm_sdk_builder/builddir/040_02_onnxruntime_deepspeed
APP_CMAKE_CFG_FLAGS_DEBUG=-DCMAKE_C_FLAGS_DEBUG=-g3 -DCMAKE_CXX_FLAGS_DEBUG=-g3
BUILD_RULE_ROOT_DIR=/home/chris/rocm_sdk_builder/binfo
_=/usr/bin/env

/home/chris/rocm_sdk_builder/builddir/040_02_onnxruntime_deepspeed
no config
configure ok: DeepSpeed

/home/chris/rocm_sdk_builder/builddir/040_02_onnxruntime_deepspeed
post-configuration DeepSpeed
no post-configuration commands
post-configuration ok: DeepSpeed

/home/chris/rocm_sdk_builder/builddir/040_02_onnxruntime_deepspeed
Building DeepSpeed
[0] DeepSpeed, build command:
cd /home/chris/rocm_sdk_builder/src_projects/DeepSpeed
[1] DeepSpeed, build command:
./build_deepspeed_rocm.sh
hip_fatbin.cpp: COMGR API could not find the CO for this GPU device/ISA: amdgcn-amd-amdhsa--gfx1101
hip_fatbin.cpp: COMGR API could not find the CO for this GPU device/ISA: amdgcn-amd-amdhsa--gfx1101
DS_BUILD_OPS=1
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
 [WARNING]  One can disable async_io with DS_BUILD_AIO=0
 [ERROR]  Unable to pre-compile async_io
Traceback (most recent call last):
  File "/home/chris/rocm_sdk_builder/src_projects/DeepSpeed/setup.py", line 180, in <module>
    abort(f"Unable to pre-compile {op_name}")
  File "/home/chris/rocm_sdk_builder/src_projects/DeepSpeed/setup.py", line 52, in abort
    assert False, msg
AssertionError: Unable to pre-compile async_io
build failed: DeepSpeed
  error in build cmd: ./build_deepspeed_rocm.sh

build failed
Crizle commented 4 weeks ago

Just quickly tried Pytorch example:

> chris@fedora:~/rocm_sdk_builder$ source /opt/rocm_sdk_611/bin/env_rocm.sh
> chris@fedora:~/rocm_sdk_builder$ cd /opt/rocm_sdk_611/docs/examples/pytorch
> chris@fedora:/opt/rocm_sdk_611/docs/examples/pytorch$ ./run_pytorch_gpu_simple_test.sh
> bash: ./run_pytorch_gpu_simple_test.sh: No such file or directory
> chris@fedora:/opt/rocm_sdk_611/docs/examples/pytorch$ ls
> pytorch_amd_gpu_intro.ipynb  pytorch_simple_cpu_vs_gpu_benchmark.ipynb
> pytorch_gpu_simple_test.py   test_torch_migraphx_resnet50.py
> chris@fedora:/opt/rocm_sdk_611/docs/examples/pytorch$ ./pytorch_gpu_simple_test.sh
> bash: ./pytorch_gpu_simple_test.sh: No such file or directory
> chris@fedora:/opt/rocm_sdk_611/docs/examples/pytorch$ python pytorch_gpu_simple_test.py 
> hip_fatbin.cpp: COMGR API could not find the CO for this GPU device/ISA: amdgcn-amd-amdhsa--gfx1101
> hip_fatbin.cpp: COMGR API could not find the CO for this GPU device/ISA: amdgcn-amd-amdhsa--gfx1101
> tensor([-1.1379], device='cuda:0')
Crizle commented 4 weeks ago

image

lamikr commented 4 weeks ago

One test script was missing that did not work for you, I added it now. You can get it installed by doing

git pull
rm -rf builddir/001_rocm_core
./babs.sh -b
lamikr commented 4 weeks ago

Btw, Eitch was also having build probem on DeepSpeed and was able to solve it somehow. https://github.com/lamikr/rocm_sdk_builder/issues/8

Wondering whether you have also same /dev/kfd permission problem. Can you send me the output of:

ls -la /dev/kfd

lamikr commented 4 weeks ago

And btw. I have not done the .sh executable script for most of the tests on docs/examples/pytorch dir but you can run them by using python. For example:

python test_torch_migraphx_resnet50.py

Crizle commented 3 weeks ago

ls -la /dev/kfd

Hi Lamikr, I don't think I have that same issue, seems the permissions are correct?

ls -la /dev/kfd
crw-rw-rw-. 1 root render 235, 0 Jun  4 09:47 /dev/kfd
Crizle commented 3 weeks ago

I'll post the results of most of the tests(that seem to work!):

source /opt/rocm_sdk_611/bin/env_rocm.sh
chris@fedora:~/rocm_sdk_builder/docs/examples/pytorch$ rocminfo
ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
Runtime Ext Version:     1.4
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 5 5600G with Radeon Graphics
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 5 5600G with Radeon Graphics
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   4464                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            12                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    32737584(0x1f38930) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    32737584(0x1f38930) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    32737584(0x1f38930) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1101                            
  Uuid:                    GPU-00e61f8be68be4d8               
  Marketing Name:          AMD Radeon RX 7800 XT              
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      32(0x20) KB                        
    L2:                      4096(0x1000) KB                    
    L3:                      65536(0x10000) KB                  
  Chip ID:                 29822(0x747e)                      
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2254                               
  BDFID:                   4608                               
  Internal Node ID:        1                                  
  Compute Unit:            60                                 
  SIMDs per CU:            2                                  
  Shader Engines:          3                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 102                                
  SDMA engine uCode::      21                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16760832(0xffc000) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    16760832(0xffc000) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1101         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***
Crizle commented 3 weeks ago

test_torch_migraphx_resnet50.py

hip_fatbin.cpp: COMGR API could not find the CO for this GPU device/ISA: amdgcn-amd-amdhsa--gfx1101
hip_fatbin.cpp: COMGR API could not find the CO for this GPU device/ISA: amdgcn-amd-amdhsa--gfx1101
Got 1 acc subgraphs and 0 non-acc subgraphs
/opt/rocm_sdk_611/lib/python3.9/site-packages/torch_migraphx/fx/mgx_module.py:101: UserWarning: Input x not on gpu device. Copying to device before execution, however, this will add extra overhead if running a performance benckmark.
  warnings.warn(
tensor([[ 6.2899, 14.9466, 31.7234,  ..., 18.6821, 18.1498, -1.4577],
        [ 6.9077, 14.2246, 29.9605,  ..., 18.8121, 18.2528, -1.7136]],
       device='cuda:0')
Crizle commented 3 weeks ago

./run_pytorch_gpu_simple_test.sh

hip_fatbin.cpp: COMGR API could not find the CO for this GPU device/ISA: amdgcn-amd-amdhsa--gfx1101
hip_fatbin.cpp: COMGR API could not find the CO for this GPU device/ISA: amdgcn-amd-amdhsa--gfx1101
tensor([-0.9287], device='cuda:0')
Crizle commented 3 weeks ago

jupyter-notebook pytorch_amd_gpu_intro.ipynb

$ jupyter-notebook pytorch_amd_gpu_intro.ipynb 
[I 2024-06-04 11:54:21.890 ServerApp] jupyter_lsp | extension was successfully linked.
[I 2024-06-04 11:54:21.894 ServerApp] jupyter_server_terminals | extension was successfully linked.
[I 2024-06-04 11:54:21.897 ServerApp] jupyterlab | extension was successfully linked.
[I 2024-06-04 11:54:21.900 ServerApp] notebook | extension was successfully linked.
[I 2024-06-04 11:54:22.145 ServerApp] notebook_shim | extension was successfully linked.
[I 2024-06-04 11:54:22.163 ServerApp] notebook_shim | extension was successfully loaded.
[I 2024-06-04 11:54:22.165 ServerApp] jupyter_lsp | extension was successfully loaded.
[I 2024-06-04 11:54:22.166 ServerApp] jupyter_server_terminals | extension was successfully loaded.
[I 2024-06-04 11:54:22.168 LabApp] JupyterLab extension loaded from /opt/rocm_sdk_611/lib/python3.9/site-packages/jupyterlab
[I 2024-06-04 11:54:22.168 LabApp] JupyterLab application directory is /opt/rocm_sdk_611/share/jupyter/lab
[I 2024-06-04 11:54:22.169 LabApp] Extension Manager is 'pypi'.
[I 2024-06-04 11:54:22.178 ServerApp] jupyterlab | extension was successfully loaded.
[I 2024-06-04 11:54:22.181 ServerApp] notebook | extension was successfully loaded.
[I 2024-06-04 11:54:22.182 ServerApp] Serving notebooks from local directory: /home/chris/rocm_sdk_builder/docs/examples/pytorch
[I 2024-06-04 11:54:22.182 ServerApp] Jupyter Server 2.14.1 is running at:
[I 2024-06-04 11:54:22.182 ServerApp] http://localhost:8888/tree?token=a171fffe0027a2d5041542ee8a604aa3520bf482c2f69eb0
[I 2024-06-04 11:54:22.182 ServerApp]     http://127.0.0.1:8888/tree?token=a171fffe0027a2d5041542ee8a604aa3520bf482c2f69eb0
[I 2024-06-04 11:54:22.182 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 2024-06-04 11:54:22.219 ServerApp] 

    To access the server, open this file in a browser:
        file:///home/chris/.local/share/jupyter/runtime/jpserver-12459-open.html
    Or copy and paste one of these URLs:
        http://localhost:8888/tree?token=a171fffe0027a2d5041542ee8a604aa3520bf482c2f69eb0
        http://127.0.0.1:8888/tree?token=a171fffe0027a2d5041542ee8a604aa3520bf482c2f69eb0
[I 2024-06-04 11:54:22.241 ServerApp] Skipped non-installed server(s): bash-language-server, dockerfile-language-server-nodejs, javascript-typescript-langserver, jedi-language-server, julia-language-server, pyright, python-language-server, python-lsp-server, r-languageserver, sql-language-server, texlab, typescript-language-server, unified-language-server, vscode-css-languageserver-bin, vscode-html-languageserver-bin, vscode-json-languageserver-bin, yaml-language-server
[I 2024-06-04 11:54:23.428 JupyterNotebookApp] 302 GET /tree/pytorch_amd_gpu_intro.ipynb?token=[secret] (4405c2d144a947608fb3ffa1bd1ea8e4@127.0.0.1) 1.69ms
[W 2024-06-04 11:54:24.374 ServerApp] Notebook pytorch_amd_gpu_intro.ipynb is not trusted
[I 2024-06-04 11:54:24.533 ServerApp] Kernel started: 014d0500-f2f5-4bd2-b796-0239d299ca1b
[I 2024-06-04 11:54:24.923 ServerApp] Connecting to kernel 014d0500-f2f5-4bd2-b796-0239d299ca1b.
[I 2024-06-04 11:54:24.926 ServerApp] Connecting to kernel 014d0500-f2f5-4bd2-b796-0239d299ca1b.
[I 2024-06-04 11:54:24.931 ServerApp] Connecting to kernel 014d0500-f2f5-4bd2-b796-0239d299ca1b.

image

Crizle commented 3 weeks ago

jupyter-notebook pytorch_simple_cpu_vs_gpu_benchmark.ipynb

jupyter-notebook pytorch_simple_cpu_vs_gpu_benchmark.ipynb 
[I 2024-06-04 11:58:05.059 ServerApp] jupyter_lsp | extension was successfully linked.
[I 2024-06-04 11:58:05.062 ServerApp] jupyter_server_terminals | extension was successfully linked.
[I 2024-06-04 11:58:05.066 ServerApp] jupyterlab | extension was successfully linked.
[I 2024-06-04 11:58:05.069 ServerApp] notebook | extension was successfully linked.
[I 2024-06-04 11:58:05.218 ServerApp] notebook_shim | extension was successfully linked.
[I 2024-06-04 11:58:05.233 ServerApp] notebook_shim | extension was successfully loaded.
[I 2024-06-04 11:58:05.235 ServerApp] jupyter_lsp | extension was successfully loaded.
[I 2024-06-04 11:58:05.236 ServerApp] jupyter_server_terminals | extension was successfully loaded.
[I 2024-06-04 11:58:05.237 LabApp] JupyterLab extension loaded from /opt/rocm_sdk_611/lib/python3.9/site-packages/jupyterlab
[I 2024-06-04 11:58:05.237 LabApp] JupyterLab application directory is /opt/rocm_sdk_611/share/jupyter/lab
[I 2024-06-04 11:58:05.237 LabApp] Extension Manager is 'pypi'.
[I 2024-06-04 11:58:05.247 ServerApp] jupyterlab | extension was successfully loaded.
[I 2024-06-04 11:58:05.250 ServerApp] notebook | extension was successfully loaded.
[I 2024-06-04 11:58:05.250 ServerApp] Serving notebooks from local directory: /home/chris/rocm_sdk_builder/docs/examples/pytorch
[I 2024-06-04 11:58:05.250 ServerApp] Jupyter Server 2.14.1 is running at:
[I 2024-06-04 11:58:05.250 ServerApp] http://localhost:8888/tree?token=bbf9fac33cf42c5ff099982bee4ab5bd274d7e2829ae2213
[I 2024-06-04 11:58:05.250 ServerApp]     http://127.0.0.1:8888/tree?token=bbf9fac33cf42c5ff099982bee4ab5bd274d7e2829ae2213
[I 2024-06-04 11:58:05.250 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 2024-06-04 11:58:05.286 ServerApp] 

    To access the server, open this file in a browser:
        file:///home/chris/.local/share/jupyter/runtime/jpserver-12873-open.html
    Or copy and paste one of these URLs:
        http://localhost:8888/tree?token=bbf9fac33cf42c5ff099982bee4ab5bd274d7e2829ae2213
        http://127.0.0.1:8888/tree?token=bbf9fac33cf42c5ff099982bee4ab5bd274d7e2829ae2213
[I 2024-06-04 11:58:05.308 ServerApp] Skipped non-installed server(s): bash-language-server, dockerfile-language-server-nodejs, javascript-typescript-langserver, jedi-language-server, julia-language-server, pyright, python-language-server, python-lsp-server, r-languageserver, sql-language-server, texlab, typescript-language-server, unified-language-server, vscode-css-languageserver-bin, vscode-html-languageserver-bin, vscode-json-languageserver-bin, yaml-language-server
[I 2024-06-04 11:58:06.489 JupyterNotebookApp] 302 GET /tree/pytorch_simple_cpu_vs_gpu_benchmark.ipynb?token=[secret] (4405c2d144a947608fb3ffa1bd1ea8e4@127.0.0.1) 1.85ms
[W 2024-06-04 11:58:07.359 ServerApp] Notebook pytorch_simple_cpu_vs_gpu_benchmark.ipynb is not trusted
[I 2024-06-04 11:58:07.528 ServerApp] Kernel started: 03f286dc-69f4-4b37-aa4f-33256b1733a1
[I 2024-06-04 11:58:07.909 ServerApp] Connecting to kernel 03f286dc-69f4-4b37-aa4f-33256b1733a1.
[I 2024-06-04 11:58:07.912 ServerApp] Connecting to kernel 03f286dc-69f4-4b37-aa4f-33256b1733a1.
[I 2024-06-04 11:58:07.920 ServerApp] Connecting to kernel 03f286dc-69f4-4b37-aa4f-33256b1733a1.
[W 2024-06-04 11:58:20.450 ServerApp] 404 GET /api/kernels/014d0500-f2f5-4bd2-b796-0239d299ca1b?1717498700449 (127.0.0.1): Kernel does not exist: 014d0500-f2f5-4bd2-b796-0239d299ca1b
[W 2024-06-04 11:58:20.450 ServerApp] wrote error: 'Kernel does not exist: 014d0500-f2f5-4bd2-b796-0239d299ca1b'
    Traceback (most recent call last):
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/tornado/web.py", line 1790, in _execute
        result = await result
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/jupyter_server/auth/decorator.py", line 73, in inner
        return await out
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/jupyter_server/services/kernels/handlers.py", line 75, in get
        model = await ensure_async(km.kernel_model(kernel_id))
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/jupyter_server/services/kernels/kernelmanager.py", line 501, in kernel_model
        self._check_kernel_id(kernel_id)
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/jupyter_server/services/kernels/kernelmanager.py", line 532, in _check_kernel_id
        raise web.HTTPError(404, "Kernel does not exist: %s" % kernel_id)
    tornado.web.HTTPError: HTTP 404: Not Found (Kernel does not exist: 014d0500-f2f5-4bd2-b796-0239d299ca1b)
[W 2024-06-04 11:58:20.454 ServerApp] 404 GET /api/kernels/014d0500-f2f5-4bd2-b796-0239d299ca1b?1717498700449 (4405c2d144a947608fb3ffa1bd1ea8e4@127.0.0.1) 3.78ms referer=http://localhost:8888/notebooks/pytorch_amd_gpu_intro.ipynb
[W 2024-06-04 11:58:21.451 ServerApp] 404 GET /api/kernels/014d0500-f2f5-4bd2-b796-0239d299ca1b/channels?session_id=279437bf-c383-4c04-998f-a0165f29a9eb (127.0.0.1): Kernel does not exist: 014d0500-f2f5-4bd2-b796-0239d299ca1b
[W 2024-06-04 11:58:21.462 ServerApp] 404 GET /api/kernels/014d0500-f2f5-4bd2-b796-0239d299ca1b/channels?session_id=279437bf-c383-4c04-998f-a0165f29a9eb (4405c2d144a947608fb3ffa1bd1ea8e4@127.0.0.1) 12.06ms referer=None
[W 2024-06-04 11:58:21.464 ServerApp] 404 GET /api/kernels/014d0500-f2f5-4bd2-b796-0239d299ca1b?1717498701463 (127.0.0.1): Kernel does not exist: 014d0500-f2f5-4bd2-b796-0239d299ca1b
[W 2024-06-04 11:58:21.465 ServerApp] wrote error: 'Kernel does not exist: 014d0500-f2f5-4bd2-b796-0239d299ca1b'
    Traceback (most recent call last):
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/tornado/web.py", line 1790, in _execute
        result = await result
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/jupyter_server/auth/decorator.py", line 73, in inner
        return await out
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/jupyter_server/services/kernels/handlers.py", line 75, in get
        model = await ensure_async(km.kernel_model(kernel_id))
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/jupyter_server/services/kernels/kernelmanager.py", line 501, in kernel_model
        self._check_kernel_id(kernel_id)
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/jupyter_server/services/kernels/kernelmanager.py", line 532, in _check_kernel_id
        raise web.HTTPError(404, "Kernel does not exist: %s" % kernel_id)
    tornado.web.HTTPError: HTTP 404: Not Found (Kernel does not exist: 014d0500-f2f5-4bd2-b796-0239d299ca1b)
[W 2024-06-04 11:58:21.465 ServerApp] 404 GET /api/kernels/014d0500-f2f5-4bd2-b796-0239d299ca1b?1717498701463 (4405c2d144a947608fb3ffa1bd1ea8e4@127.0.0.1) 0.83ms referer=http://localhost:8888/notebooks/pytorch_amd_gpu_intro.ipynb
[W 2024-06-04 11:58:23.457 ServerApp] 404 GET /api/kernels/014d0500-f2f5-4bd2-b796-0239d299ca1b/channels?session_id=279437bf-c383-4c04-998f-a0165f29a9eb (127.0.0.1): Kernel does not exist: 014d0500-f2f5-4bd2-b796-0239d299ca1b
[W 2024-06-04 11:58:23.458 ServerApp] 404 GET /api/kernels/014d0500-f2f5-4bd2-b796-0239d299ca1b/channels?session_id=279437bf-c383-4c04-998f-a0165f29a9eb (4405c2d144a947608fb3ffa1bd1ea8e4@127.0.0.1) 1.14ms referer=None
[W 2024-06-04 11:58:23.460 ServerApp] 404 GET /api/kernels/014d0500-f2f5-4bd2-b796-0239d299ca1b?1717498703459 (127.0.0.1): Kernel does not exist: 014d0500-f2f5-4bd2-b796-0239d299ca1b
[W 2024-06-04 11:58:23.460 ServerApp] wrote error: 'Kernel does not exist: 014d0500-f2f5-4bd2-b796-0239d299ca1b'
    Traceback (most recent call last):
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/tornado/web.py", line 1790, in _execute
        result = await result
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/jupyter_server/auth/decorator.py", line 73, in inner
        return await out
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/jupyter_server/services/kernels/handlers.py", line 75, in get
        model = await ensure_async(km.kernel_model(kernel_id))
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/jupyter_server/services/kernels/kernelmanager.py", line 501, in kernel_model
        self._check_kernel_id(kernel_id)
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/jupyter_server/services/kernels/kernelmanager.py", line 532, in _check_kernel_id
        raise web.HTTPError(404, "Kernel does not exist: %s" % kernel_id)
    tornado.web.HTTPError: HTTP 404: Not Found (Kernel does not exist: 014d0500-f2f5-4bd2-b796-0239d299ca1b)
[W 2024-06-04 11:58:23.460 ServerApp] 404 GET /api/kernels/014d0500-f2f5-4bd2-b796-0239d299ca1b?1717498703459 (4405c2d144a947608fb3ffa1bd1ea8e4@127.0.0.1) 0.87ms referer=http://localhost:8888/notebooks/pytorch_amd_gpu_intro.ipynb
[W 2024-06-04 11:58:24.464 ServerApp] 404 GET /api/kernels/014d0500-f2f5-4bd2-b796-0239d299ca1b/channels?session_id=279437bf-c383-4c04-998f-a0165f29a9eb (127.0.0.1): Kernel does not exist: 014d0500-f2f5-4bd2-b796-0239d299ca1b
[W 2024-06-04 11:58:24.464 ServerApp] 404 GET /api/kernels/014d0500-f2f5-4bd2-b796-0239d299ca1b/channels?session_id=279437bf-c383-4c04-998f-a0165f29a9eb (4405c2d144a947608fb3ffa1bd1ea8e4@127.0.0.1) 1.08ms referer=None
[W 2024-06-04 11:58:24.466 ServerApp] 404 GET /api/kernels/014d0500-f2f5-4bd2-b796-0239d299ca1b?1717498704466 (127.0.0.1): Kernel does not exist: 014d0500-f2f5-4bd2-b796-0239d299ca1b
[W 2024-06-04 11:58:24.466 ServerApp] wrote error: 'Kernel does not exist: 014d0500-f2f5-4bd2-b796-0239d299ca1b'
    Traceback (most recent call last):
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/tornado/web.py", line 1790, in _execute
        result = await result
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/jupyter_server/auth/decorator.py", line 73, in inner
        return await out
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/jupyter_server/services/kernels/handlers.py", line 75, in get
        model = await ensure_async(km.kernel_model(kernel_id))
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/jupyter_server/services/kernels/kernelmanager.py", line 501, in kernel_model
        self._check_kernel_id(kernel_id)
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/jupyter_server/services/kernels/kernelmanager.py", line 532, in _check_kernel_id
        raise web.HTTPError(404, "Kernel does not exist: %s" % kernel_id)
    tornado.web.HTTPError: HTTP 404: Not Found (Kernel does not exist: 014d0500-f2f5-4bd2-b796-0239d299ca1b)
[W 2024-06-04 11:58:24.467 ServerApp] 404 GET /api/kernels/014d0500-f2f5-4bd2-b796-0239d299ca1b?1717498704466 (4405c2d144a947608fb3ffa1bd1ea8e4@127.0.0.1) 0.77ms referer=http://localhost:8888/notebooks/pytorch_amd_gpu_intro.ipynb
[I 2024-06-04 11:58:24.469 ServerApp] Saving file at /pytorch_amd_gpu_intro.ipynb
[W 2024-06-04 11:58:33.491 ServerApp] 404 GET /api/kernels/014d0500-f2f5-4bd2-b796-0239d299ca1b/channels?session_id=279437bf-c383-4c04-998f-a0165f29a9eb (127.0.0.1): Kernel does not exist: 014d0500-f2f5-4bd2-b796-0239d299ca1b
[W 2024-06-04 11:58:33.492 ServerApp] 404 GET /api/kernels/014d0500-f2f5-4bd2-b796-0239d299ca1b/channels?session_id=279437bf-c383-4c04-998f-a0165f29a9eb (4405c2d144a947608fb3ffa1bd1ea8e4@127.0.0.1) 1.06ms referer=None
[W 2024-06-04 11:58:33.494 ServerApp] 404 GET /api/kernels/014d0500-f2f5-4bd2-b796-0239d299ca1b?1717498713493 (127.0.0.1): Kernel does not exist: 014d0500-f2f5-4bd2-b796-0239d299ca1b
[W 2024-06-04 11:58:33.494 ServerApp] wrote error: 'Kernel does not exist: 014d0500-f2f5-4bd2-b796-0239d299ca1b'
    Traceback (most recent call last):
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/tornado/web.py", line 1790, in _execute
        result = await result
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/jupyter_server/auth/decorator.py", line 73, in inner
        return await out
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/jupyter_server/services/kernels/handlers.py", line 75, in get
        model = await ensure_async(km.kernel_model(kernel_id))
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/jupyter_server/services/kernels/kernelmanager.py", line 501, in kernel_model
        self._check_kernel_id(kernel_id)
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/jupyter_server/services/kernels/kernelmanager.py", line 532, in _check_kernel_id
        raise web.HTTPError(404, "Kernel does not exist: %s" % kernel_id)
    tornado.web.HTTPError: HTTP 404: Not Found (Kernel does not exist: 014d0500-f2f5-4bd2-b796-0239d299ca1b)
[W 2024-06-04 11:58:33.494 ServerApp] 404 GET /api/kernels/014d0500-f2f5-4bd2-b796-0239d299ca1b?1717498713493 (4405c2d144a947608fb3ffa1bd1ea8e4@127.0.0.1) 0.78ms referer=http://localhost:8888/notebooks/pytorch_amd_gpu_intro.ipynb
[W 2024-06-04 11:58:39.494 ServerApp] 404 GET /api/kernels/014d0500-f2f5-4bd2-b796-0239d299ca1b/channels?session_id=279437bf-c383-4c04-998f-a0165f29a9eb (127.0.0.1): Kernel does not exist: 014d0500-f2f5-4bd2-b796-0239d299ca1b
[W 2024-06-04 11:58:39.494 ServerApp] 404 GET /api/kernels/014d0500-f2f5-4bd2-b796-0239d299ca1b/channels?session_id=279437bf-c383-4c04-998f-a0165f29a9eb (4405c2d144a947608fb3ffa1bd1ea8e4@127.0.0.1) 1.05ms referer=None
[W 2024-06-04 11:58:39.496 ServerApp] 404 GET /api/kernels/014d0500-f2f5-4bd2-b796-0239d299ca1b?1717498719495 (127.0.0.1): Kernel does not exist: 014d0500-f2f5-4bd2-b796-0239d299ca1b
[W 2024-06-04 11:58:39.496 ServerApp] wrote error: 'Kernel does not exist: 014d0500-f2f5-4bd2-b796-0239d299ca1b'
    Traceback (most recent call last):
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/tornado/web.py", line 1790, in _execute
        result = await result
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/jupyter_server/auth/decorator.py", line 73, in inner
        return await out
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/jupyter_server/services/kernels/handlers.py", line 75, in get
        model = await ensure_async(km.kernel_model(kernel_id))
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/jupyter_server/services/kernels/kernelmanager.py", line 501, in kernel_model
        self._check_kernel_id(kernel_id)
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/jupyter_server/services/kernels/kernelmanager.py", line 532, in _check_kernel_id
        raise web.HTTPError(404, "Kernel does not exist: %s" % kernel_id)
    tornado.web.HTTPError: HTTP 404: Not Found (Kernel does not exist: 014d0500-f2f5-4bd2-b796-0239d299ca1b)
[W 2024-06-04 11:58:39.497 ServerApp] 404 GET /api/kernels/014d0500-f2f5-4bd2-b796-0239d299ca1b?1717498719495 (4405c2d144a947608fb3ffa1bd1ea8e4@127.0.0.1) 0.80ms referer=http://localhost:8888/notebooks/pytorch_amd_gpu_intro.ipynb
[W 2024-06-04 11:58:53.501 ServerApp] 404 GET /api/kernels/014d0500-f2f5-4bd2-b796-0239d299ca1b/channels?session_id=279437bf-c383-4c04-998f-a0165f29a9eb (127.0.0.1): Kernel does not exist: 014d0500-f2f5-4bd2-b796-0239d299ca1b
[W 2024-06-04 11:58:53.502 ServerApp] 404 GET /api/kernels/014d0500-f2f5-4bd2-b796-0239d299ca1b/channels?session_id=279437bf-c383-4c04-998f-a0165f29a9eb (4405c2d144a947608fb3ffa1bd1ea8e4@127.0.0.1) 5.50ms referer=None
[W 2024-06-04 11:58:53.505 ServerApp] 404 GET /api/kernels/014d0500-f2f5-4bd2-b796-0239d299ca1b?1717498733504 (127.0.0.1): Kernel does not exist: 014d0500-f2f5-4bd2-b796-0239d299ca1b
[W 2024-06-04 11:58:53.505 ServerApp] wrote error: 'Kernel does not exist: 014d0500-f2f5-4bd2-b796-0239d299ca1b'
    Traceback (most recent call last):
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/tornado/web.py", line 1790, in _execute
        result = await result
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/jupyter_server/auth/decorator.py", line 73, in inner
        return await out
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/jupyter_server/services/kernels/handlers.py", line 75, in get
        model = await ensure_async(km.kernel_model(kernel_id))
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/jupyter_server/services/kernels/kernelmanager.py", line 501, in kernel_model
        self._check_kernel_id(kernel_id)
      File "/opt/rocm_sdk_611/lib/python3.9/site-packages/jupyter_server/services/kernels/kernelmanager.py", line 532, in _check_kernel_id
        raise web.HTTPError(404, "Kernel does not exist: %s" % kernel_id)
    tornado.web.HTTPError: HTTP 404: Not Found (Kernel does not exist: 014d0500-f2f5-4bd2-b796-0239d299ca1b)
[W 2024-06-04 11:58:53.506 ServerApp] 404 GET /api/kernels/014d0500-f2f5-4bd2-b796-0239d299ca1b?1717498733504 (4405c2d144a947608fb3ffa1bd1ea8e4@127.0.0.1) 1.81ms referer=http://localhost:8888/notebooks/pytorch_amd_gpu_intro.ipynb
Crizle commented 3 weeks ago

./test_migraphx_install.sh

Running [ MIGraphX Version: 2.9.0.318737422 ]: migraphx-driver perf --model resnet50
Compiling ... 
module: "main"
@0 = check_context::migraphx::gpu::context -> float_type, {}, {}, target_id=0
@1 = hip::hip_allocate_memory[shape=int8_type, {9633792}, {1},id=main:scratch] -> int8_type, {9633792}, {1}, target_id=0
@2 = hip::hip_copy_literal[id=main:@literal:17] -> float_type, {64, 3, 7, 7}, {147, 49, 7, 1}, target_id=0
@3 = hip::hip_copy_literal[id=main:@literal:50] -> float_type, {64, 1, 1}, {1, 1, 1}, target_id=0
@4 = load[offset=3211264,end=6422528](@1) -> float_type, {1, 64, 112, 112}, {802816, 12544, 112, 1}, target_id=0
@5 = multibroadcast[out_lens={1, 64, 112, 112},out_dyn_dims={}](@3) -> float_type, {1, 64, 112, 112}, {0, 1, 0, 0}, target_id=0
0 = @param:0 -> float_type, {1, 3, 224, 224}, {150528, 50176, 224, 1}, target_id=0
@7 = gpu::code_object[code_object=9168,symbol_name=mlir_convolution_add_relu,global=12544,local=64,](@5,0,@2,@4) -> float_type, {1, 64, 112, 112}, {802816, 12544, 112, 1}, target_id=0
@8 = load[offset=2408448,end=3211264](@1) -> float_type, {1, 64, 56, 56}, {200704, 3136, 56, 1}, target_id=0
@9 = gpu::pooling[mode=max,padding={1, 1, 1, 1},padding_mode=0,stride={2, 2},lengths={3, 3},dilations={1, 1},ceil_mode=0,lp_order=2,dyn_global=0](@7,@8) -> float_type, {1, 64, 56, 56}, {200704, 3136, 56, 1}, target_id=0
@10 = hip::hip_copy_literal[id=main:@literal:62] -> float_type, {64, 64, 1, 1}, {64, 1, 1, 1}, target_id=0
@11 = hip::hip_copy_literal[id=main:@literal:33] -> float_type, {64, 1, 1}, {1, 1, 1}, target_id=0
@12 = load[offset=802816,end=1605632](@1) -> float_type, {1, 64, 56, 56}, {200704, 3136, 56, 1}, target_id=0
@13 = multibroadcast[out_lens={1, 64, 56, 56},out_dyn_dims={}](@11) -> float_type, {1, 64, 56, 56}, {0, 1, 0, 0}, target_id=0
@14 = gpu::code_object[code_object=6096,symbol_name=mlir_convolution_add_relu,global=6272,local=64,](@13,@9,@10,@12) -> float_type, {1, 64, 56, 56}, {200704, 3136, 56, 1}, target_id=0
@15 = hip::hip_copy_literal[id=main:@literal:60] -> float_type, {64, 64, 3, 3}, {576, 9, 3, 1}, target_id=0
@16 = hip::hip_copy_literal[id=main:@literal:65] -> float_type, {64, 1, 1}, {1, 1, 1}, target_id=0
@17 = multibroadcast[out_lens={1, 64, 56, 56},out_dyn_dims={}](@16) -> float_type, {1, 64, 56, 56}, {0, 1, 0, 0}, target_id=0
@18 = load[offset=0,end=802816](@1) -> float_type, {1, 64, 56, 56}, {200704, 3136, 56, 1}, target_id=0
@19 = gpu::code_object[code_object=8144,symbol_name=mlir_convolution_add_relu,global=6272,local=128,](@17,@14,@15,@18) -> float_type, {1, 64, 56, 56}, {200704, 3136, 56, 1}, target_id=0
@20 = hip::hip_copy_literal[id=main:@literal:61] -> float_type, {256, 128, 1, 1}, {128, 1, 1, 1}, target_id=0
@21 = load[offset=3211264,end=4816896](@1) -> float_type, {1, 128, 56, 56}, {401408, 3136, 56, 1}, target_id=0
@22 = gpu::code_object[code_object=4280,symbol_name=concat_kernel,global=50176,local=1024,](@19,@9,@21) -> float_type, {1, 128, 56, 56}, {401408, 3136, 56, 1}, target_id=0
@23 = hip::hip_copy_literal[id=main:@literal:69] -> float_type, {256, 1, 1}, {1, 1, 1}, target_id=0
@24 = load[offset=6422528,end=9633792](@1) -> float_type, {1, 256, 56, 56}, {802816, 3136, 56, 1}, target_id=0
@25 = multibroadcast[out_lens={1, 256, 56, 56},out_dyn_dims={}](@23) -> float_type, {1, 256, 56, 56}, {0, 1, 0, 0}, target_id=0
@26 = gpu::code_object[code_object=13392,symbol_name=mlir_convolution_add_relu,global=6400,local=64,](@25,@22,@20,@24) -> float_type, {1, 256, 56, 56}, {802816, 3136, 56, 1}, target_id=0
@27 = hip::hip_copy_literal[id=main:@literal:54] -> float_type, {64, 1, 1}, {1, 1, 1}, target_id=0
@28 = hip::hip_copy_literal[id=main:@literal:59] -> float_type, {64, 256, 1, 1}, {256, 1, 1, 1}, target_id=0
@29 = hip::hip_copy_literal[id=main:@literal:11] -> float_type, {64, 64, 3, 3}, {576, 9, 3, 1}, target_id=0
@30 = hip::hip_copy_literal[id=main:@literal:83] -> float_type, {64, 1, 1}, {1, 1, 1}, target_id=0
@31 = multibroadcast[out_lens={1, 64, 56, 56},out_dyn_dims={}](@30) -> float_type, {1, 64, 56, 56}, {0, 1, 0, 0}, target_id=0
@32 = load[offset=802816,end=1605632](@1) -> float_type, {1, 64, 56, 56}, {200704, 3136, 56, 1}, target_id=0
@33 = gpu::code_object[code_object=7504,symbol_name=mlir_convolution_add_relu,global=6272,local=128,](@31,@26,@28,@32) -> float_type, {1, 64, 56, 56}, {200704, 3136, 56, 1}, target_id=0
@34 = multibroadcast[out_lens={1, 64, 56, 56},out_dyn_dims={}](@27) -> float_type, {1, 64, 56, 56}, {0, 1, 0, 0}, target_id=0
@35 = load[offset=0,end=802816](@1) -> float_type, {1, 64, 56, 56}, {200704, 3136, 56, 1}, target_id=0
@36 = gpu::code_object[code_object=8144,symbol_name=mlir_convolution_add_relu,global=6272,local=128,](@34,@33,@29,@35) -> float_type, {1, 64, 56, 56}, {200704, 3136, 56, 1}, target_id=0
@37 = hip::hip_copy_literal[id=main:@literal:47] -> float_type, {256, 1, 1}, {1, 1, 1}, target_id=0
@38 = hip::hip_copy_literal[id=main:@literal:49] -> float_type, {256, 64, 1, 1}, {64, 1, 1, 1}, target_id=0
@39 = multibroadcast[out_lens={1, 256, 56, 56},out_dyn_dims={}](@37) -> float_type, {1, 256, 56, 56}, {0, 1, 0, 0}, target_id=0
@40 = load[offset=3211264,end=6422528](@1) -> float_type, {1, 256, 56, 56}, {802816, 3136, 56, 1}, target_id=0
@41 = gpu::code_object[code_object=11096,symbol_name=mlir_convolution_add_add_relu,global=12544,local=128,](@39,@26,@36,@38,@40) -> float_type, {1, 256, 56, 56}, {802816, 3136, 56, 1}, target_id=0
@42 = hip::hip_copy_literal[id=main:@literal:27] -> float_type, {64, 1, 1}, {1, 1, 1}, target_id=0
@43 = hip::hip_copy_literal[id=main:@literal:43] -> float_type, {64, 256, 1, 1}, {256, 1, 1, 1}, target_id=0
@44 = multibroadcast[out_lens={1, 64, 56, 56},out_dyn_dims={}](@42) -> float_type, {1, 64, 56, 56}, {0, 1, 0, 0}, target_id=0
@45 = load[offset=0,end=802816](@1) -> float_type, {1, 64, 56, 56}, {200704, 3136, 56, 1}, target_id=0
@46 = gpu::code_object[code_object=7504,symbol_name=mlir_convolution_add_relu,global=6272,local=128,](@44,@41,@43,@45) -> float_type, {1, 64, 56, 56}, {200704, 3136, 56, 1}, target_id=0
@47 = hip::hip_copy_literal[id=main:@literal:70] -> float_type, {256, 64, 1, 1}, {64, 1, 1, 1}, target_id=0
@48 = hip::hip_copy_literal[id=main:@literal:53] -> float_type, {64, 64, 3, 3}, {576, 9, 3, 1}, target_id=0
@49 = hip::hip_copy_literal[id=main:@literal:39] -> float_type, {64, 1, 1}, {1, 1, 1}, target_id=0
@50 = multibroadcast[out_lens={1, 64, 56, 56},out_dyn_dims={}](@49) -> float_type, {1, 64, 56, 56}, {0, 1, 0, 0}, target_id=0
@51 = load[offset=6422528,end=7225344](@1) -> float_type, {1, 64, 56, 56}, {200704, 3136, 56, 1}, target_id=0
@52 = gpu::code_object[code_object=8144,symbol_name=mlir_convolution_add_relu,global=6272,local=128,](@50,@46,@48,@51) -> float_type, {1, 64, 56, 56}, {200704, 3136, 56, 1}, target_id=0
@53 = hip::hip_copy_literal[id=main:@literal:74] -> float_type, {256, 1, 1}, {1, 1, 1}, target_id=0
@54 = multibroadcast[out_lens={1, 256, 56, 56},out_dyn_dims={}](@53) -> float_type, {1, 256, 56, 56}, {0, 1, 0, 0}, target_id=0
@55 = load[offset=0,end=3211264](@1) -> float_type, {1, 256, 56, 56}, {802816, 3136, 56, 1}, target_id=0
@56 = gpu::code_object[code_object=11096,symbol_name=mlir_convolution_add_add_relu,global=12544,local=128,](@54,@41,@52,@47,@55) -> float_type, {1, 256, 56, 56}, {802816, 3136, 56, 1}, target_id=0
@57 = hip::hip_copy_literal[id=main:@literal:58] -> float_type, {128, 1, 1}, {1, 1, 1}, target_id=0
@58 = hip::hip_copy_literal[id=main:@literal:77] -> float_type, {128, 256, 1, 1}, {256, 1, 1, 1}, target_id=0
@59 = hip::hip_copy_literal[id=main:@literal:72] -> float_type, {128, 1, 1}, {1, 1, 1}, target_id=0
@60 = multibroadcast[out_lens={1, 128, 56, 56},out_dyn_dims={}](@59) -> float_type, {1, 128, 56, 56}, {0, 1, 0, 0}, target_id=0
@61 = load[offset=4816896,end=6422528](@1) -> float_type, {1, 128, 56, 56}, {401408, 3136, 56, 1}, target_id=0
@62 = gpu::code_object[code_object=8016,symbol_name=mlir_convolution_add_relu,global=6272,local=64,](@60,@56,@58,@61) -> float_type, {1, 128, 56, 56}, {401408, 3136, 56, 1}, target_id=0
@63 = hip::hip_copy_literal[id=main:@literal:79] -> float_type, {128, 128, 3, 3}, {1152, 9, 3, 1}, target_id=0
@64 = multibroadcast[out_lens={1, 128, 28, 28},out_dyn_dims={}](@57) -> float_type, {1, 128, 28, 28}, {0, 1, 0, 0}, target_id=0
@65 = load[offset=4415488,end=4816896](@1) -> float_type, {1, 128, 28, 28}, {100352, 784, 28, 1}, target_id=0
@66 = gpu::code_object[code_object=9552,symbol_name=mlir_convolution_add_relu,global=6656,local=256,](@64,@62,@63,@65) -> float_type, {1, 128, 28, 28}, {100352, 784, 28, 1}, target_id=0
@67 = hip::hip_copy_literal[id=main:@literal:45] -> float_type, {512, 1, 1}, {1, 1, 1}, target_id=0
@68 = hip::hip_copy_literal[id=main:@literal:81] -> float_type, {512, 384, 1, 1}, {384, 1, 1, 1}, target_id=0
@69 = load[offset=3211264,end=4415488](@1) -> float_type, {1, 384, 28, 28}, {301056, 784, 28, 1}, target_id=0
@70 = step[axes={2, 3},steps={2, 2}](@56) -> float_type, {1, 256, 28, 28}, {802816, 3136, 112, 2}, target_id=0
@71 = gpu::code_object[code_object=4536,symbol_name=concat_kernel,global=150528,local=1024,](@66,@70,@69) -> float_type, {1, 384, 28, 28}, {301056, 784, 28, 1}, target_id=0
@72 = multibroadcast[out_lens={1, 512, 28, 28},out_dyn_dims={}](@67) -> float_type, {1, 512, 28, 28}, {0, 1, 0, 0}, target_id=0
@73 = load[offset=4415488,end=6021120](@1) -> float_type, {1, 512, 28, 28}, {401408, 784, 28, 1}, target_id=0
@74 = gpu::code_object[code_object=11216,symbol_name=mlir_convolution_add_relu,global=6656,local=128,](@72,@71,@68,@73) -> float_type, {1, 512, 28, 28}, {401408, 784, 28, 1}, target_id=0
@75 = hip::hip_copy_literal[id=main:@literal:25] -> float_type, {128, 128, 3, 3}, {1152, 9, 3, 1}, target_id=0
@76 = hip::hip_copy_literal[id=main:@literal:80] -> float_type, {128, 512, 1, 1}, {512, 1, 1, 1}, target_id=0
@77 = hip::hip_copy_literal[id=main:@literal:64] -> float_type, {128, 1, 1}, {1, 1, 1}, target_id=0
@78 = load[offset=802816,end=1204224](@1) -> float_type, {1, 128, 28, 28}, {100352, 784, 28, 1}, target_id=0
@79 = multibroadcast[out_lens={1, 128, 28, 28},out_dyn_dims={}](@77) -> float_type, {1, 128, 28, 28}, {0, 1, 0, 0}, target_id=0
@80 = gpu::code_object[code_object=7504,symbol_name=mlir_convolution_add_relu,global=6656,local=256,](@79,@74,@76,@78) -> float_type, {1, 128, 28, 28}, {100352, 784, 28, 1}, target_id=0
@81 = hip::hip_copy_literal[id=main:@literal:84] -> float_type, {128, 1, 1}, {1, 1, 1}, target_id=0
@82 = multibroadcast[out_lens={1, 128, 28, 28},out_dyn_dims={}](@81) -> float_type, {1, 128, 28, 28}, {0, 1, 0, 0}, target_id=0
@83 = load[offset=401408,end=802816](@1) -> float_type, {1, 128, 28, 28}, {100352, 784, 28, 1}, target_id=0
@84 = gpu::code_object[code_object=9552,symbol_name=mlir_convolution_add_relu,global=6656,local=256,](@82,@80,@75,@83) -> float_type, {1, 128, 28, 28}, {100352, 784, 28, 1}, target_id=0
@85 = hip::hip_copy_literal[id=main:@literal:36] -> float_type, {512, 1, 1}, {1, 1, 1}, target_id=0
@86 = hip::hip_copy_literal[id=main:@literal:86] -> float_type, {512, 128, 1, 1}, {128, 1, 1, 1}, target_id=0
@87 = multibroadcast[out_lens={1, 512, 28, 28},out_dyn_dims={}](@85) -> float_type, {1, 512, 28, 28}, {0, 1, 0, 0}, target_id=0
@88 = load[offset=1204224,end=2809856](@1) -> float_type, {1, 512, 28, 28}, {401408, 784, 28, 1}, target_id=0
@89 = gpu::code_object[code_object=9176,symbol_name=mlir_convolution_add_add_relu,global=6656,local=128,](@87,@74,@84,@86,@88) -> float_type, {1, 512, 28, 28}, {401408, 784, 28, 1}, target_id=0
@90 = hip::hip_copy_literal[id=main:@literal:40] -> float_type, {128, 1, 1}, {1, 1, 1}, target_id=0
@91 = hip::hip_copy_literal[id=main:@literal:57] -> float_type, {128, 1, 1}, {1, 1, 1}, target_id=0
@92 = hip::hip_copy_literal[id=main:@literal:1] -> float_type, {128, 512, 1, 1}, {512, 1, 1, 1}, target_id=0
@93 = multibroadcast[out_lens={1, 128, 28, 28},out_dyn_dims={}](@91) -> float_type, {1, 128, 28, 28}, {0, 1, 0, 0}, target_id=0
@94 = load[offset=802816,end=1204224](@1) -> float_type, {1, 128, 28, 28}, {100352, 784, 28, 1}, target_id=0
@95 = gpu::code_object[code_object=7504,symbol_name=mlir_convolution_add_relu,global=6656,local=256,](@93,@89,@92,@94) -> float_type, {1, 128, 28, 28}, {100352, 784, 28, 1}, target_id=0
@96 = hip::hip_copy_literal[id=main:@literal:6] -> float_type, {128, 128, 3, 3}, {1152, 9, 3, 1}, target_id=0
@97 = multibroadcast[out_lens={1, 128, 28, 28},out_dyn_dims={}](@90) -> float_type, {1, 128, 28, 28}, {0, 1, 0, 0}, target_id=0
@98 = load[offset=401408,end=802816](@1) -> float_type, {1, 128, 28, 28}, {100352, 784, 28, 1}, target_id=0
@99 = gpu::code_object[code_object=9552,symbol_name=mlir_convolution_add_relu,global=6656,local=256,](@97,@95,@96,@98) -> float_type, {1, 128, 28, 28}, {100352, 784, 28, 1}, target_id=0
@100 = hip::hip_copy_literal[id=main:@literal:7] -> float_type, {512, 1, 1}, {1, 1, 1}, target_id=0
@101 = hip::hip_copy_literal[id=main:@literal:89] -> float_type, {512, 128, 1, 1}, {128, 1, 1, 1}, target_id=0
@102 = multibroadcast[out_lens={1, 512, 28, 28},out_dyn_dims={}](@100) -> float_type, {1, 512, 28, 28}, {0, 1, 0, 0}, target_id=0
@103 = load[offset=3813376,end=5419008](@1) -> float_type, {1, 512, 28, 28}, {401408, 784, 28, 1}, target_id=0
@104 = gpu::code_object[code_object=9176,symbol_name=mlir_convolution_add_add_relu,global=6656,local=128,](@102,@89,@99,@101,@103) -> float_type, {1, 512, 28, 28}, {401408, 784, 28, 1}, target_id=0
@105 = hip::hip_copy_literal[id=main:@literal:26] -> float_type, {512, 1, 1}, {1, 1, 1}, target_id=0
@106 = hip::hip_copy_literal[id=main:@literal:35] -> float_type, {128, 512, 1, 1}, {512, 1, 1, 1}, target_id=0
@107 = hip::hip_copy_literal[id=main:@literal:31] -> float_type, {128, 128, 3, 3}, {1152, 9, 3, 1}, target_id=0
@108 = hip::hip_copy_literal[id=main:@literal:16] -> float_type, {128, 1, 1}, {1, 1, 1}, target_id=0
@109 = hip::hip_copy_literal[id=main:@literal:28] -> float_type, {128, 1, 1}, {1, 1, 1}, target_id=0
@110 = multibroadcast[out_lens={1, 128, 28, 28},out_dyn_dims={}](@108) -> float_type, {1, 128, 28, 28}, {0, 1, 0, 0}, target_id=0
@111 = load[offset=401408,end=802816](@1) -> float_type, {1, 128, 28, 28}, {100352, 784, 28, 1}, target_id=0
@112 = gpu::code_object[code_object=7504,symbol_name=mlir_convolution_add_relu,global=6656,local=256,](@110,@104,@106,@111) -> float_type, {1, 128, 28, 28}, {100352, 784, 28, 1}, target_id=0
@113 = multibroadcast[out_lens={1, 128, 28, 28},out_dyn_dims={}](@109) -> float_type, {1, 128, 28, 28}, {0, 1, 0, 0}, target_id=0
@114 = load[offset=0,end=401408](@1) -> float_type, {1, 128, 28, 28}, {100352, 784, 28, 1}, target_id=0
@115 = gpu::code_object[code_object=9552,symbol_name=mlir_convolution_add_relu,global=6656,local=256,](@113,@112,@107,@114) -> float_type, {1, 128, 28, 28}, {100352, 784, 28, 1}, target_id=0
@116 = hip::hip_copy_literal[id=main:@literal:51] -> float_type, {512, 128, 1, 1}, {128, 1, 1, 1}, target_id=0
@117 = multibroadcast[out_lens={1, 512, 28, 28},out_dyn_dims={}](@105) -> float_type, {1, 512, 28, 28}, {0, 1, 0, 0}, target_id=0
@118 = load[offset=2207744,end=3813376](@1) -> float_type, {1, 512, 28, 28}, {401408, 784, 28, 1}, target_id=0
@119 = gpu::code_object[code_object=9176,symbol_name=mlir_convolution_add_add_relu,global=6656,local=128,](@117,@104,@115,@116,@118) -> float_type, {1, 512, 28, 28}, {401408, 784, 28, 1}, target_id=0
@120 = hip::hip_copy_literal[id=main:@literal:23] -> float_type, {256, 1, 1}, {1, 1, 1}, target_id=0
@121 = hip::hip_copy_literal[id=main:@literal:52] -> float_type, {256, 256, 3, 3}, {2304, 9, 3, 1}, target_id=0
@122 = hip::hip_copy_literal[id=main:@literal:88] -> float_type, {256, 1, 1}, {1, 1, 1}, target_id=0
@123 = hip::hip_copy_literal[id=main:@literal:24] -> float_type, {256, 512, 1, 1}, {512, 1, 1, 1}, target_id=0
@124 = hip::hip_copy_literal[id=main:@literal:75] -> float_type, {1024, 1, 1}, {1, 1, 1}, target_id=0
@125 = hip::hip_copy_literal[id=main:@literal:82] -> float_type, {1024, 768, 1, 1}, {768, 1, 1, 1}, target_id=0
@126 = multibroadcast[out_lens={1, 256, 28, 28},out_dyn_dims={}](@120) -> float_type, {1, 256, 28, 28}, {0, 1, 0, 0}, target_id=0
@127 = load[offset=0,end=802816](@1) -> float_type, {1, 256, 28, 28}, {200704, 784, 28, 1}, target_id=0
@128 = gpu::code_object[code_object=8528,symbol_name=mlir_convolution_add_relu,global=6656,local=128,](@126,@119,@123,@127) -> float_type, {1, 256, 28, 28}, {200704, 784, 28, 1}, target_id=0
@129 = multibroadcast[out_lens={1, 256, 14, 14},out_dyn_dims={}](@122) -> float_type, {1, 256, 14, 14}, {0, 1, 0, 0}, target_id=0
@130 = load[offset=1605632,end=1806336](@1) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0
@131 = gpu::code_object[code_object=8016,symbol_name=mlir_convolution_add_relu,global=7168,local=128,](@129,@128,@121,@130) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0
@132 = step[axes={2, 3},steps={2, 2}](@119) -> float_type, {1, 512, 14, 14}, {401408, 784, 56, 2}, target_id=0
@133 = load[offset=1003520,end=1605632](@1) -> float_type, {1, 768, 14, 14}, {150528, 196, 14, 1}, target_id=0
@134 = gpu::code_object[code_object=4664,symbol_name=concat_kernel,global=75264,local=1024,](@131,@132,@133) -> float_type, {1, 768, 14, 14}, {150528, 196, 14, 1}, target_id=0
@135 = load[offset=200704,end=1003520](@1) -> float_type, {1, 1024, 14, 14}, {200704, 196, 14, 1}, target_id=0
@136 = multibroadcast[out_lens={1, 1024, 14, 14},out_dyn_dims={}](@124) -> float_type, {1, 1024, 14, 14}, {0, 1, 0, 0}, target_id=0
@137 = gpu::code_object[code_object=8912,symbol_name=mlir_convolution_add_relu,global=7168,local=64,](@136,@134,@125,@135) -> float_type, {1, 1024, 14, 14}, {200704, 196, 14, 1}, target_id=0
@138 = hip::hip_copy_literal[id=main:@literal:71] -> float_type, {256, 1, 1}, {1, 1, 1}, target_id=0
@139 = hip::hip_copy_literal[id=main:@literal:4] -> float_type, {256, 256, 3, 3}, {2304, 9, 3, 1}, target_id=0
@140 = hip::hip_copy_literal[id=main:@literal:20] -> float_type, {1024, 1, 1}, {1, 1, 1}, target_id=0
@141 = hip::hip_copy_literal[id=main:@literal:19] -> float_type, {256, 1, 1}, {1, 1, 1}, target_id=0
@142 = hip::hip_copy_literal[id=main:@literal:67] -> float_type, {1024, 1, 1}, {1, 1, 1}, target_id=0
@143 = hip::hip_copy_literal[id=main:@literal:32] -> float_type, {256, 1, 1}, {1, 1, 1}, target_id=0
@144 = hip::hip_copy_literal[id=main:@literal:22] -> float_type, {256, 1, 1}, {1, 1, 1}, target_id=0
@145 = hip::hip_copy_literal[id=main:@literal:15] -> float_type, {256, 1024, 1, 1}, {1024, 1, 1, 1}, target_id=0
@146 = hip::hip_copy_literal[id=main:@literal:46] -> float_type, {1024, 256, 1, 1}, {256, 1, 1, 1}, target_id=0
@147 = hip::hip_copy_literal[id=main:@literal:68] -> float_type, {256, 1024, 1, 1}, {1024, 1, 1, 1}, target_id=0
@148 = hip::hip_copy_literal[id=main:@literal:18] -> float_type, {256, 256, 3, 3}, {2304, 9, 3, 1}, target_id=0
@149 = hip::hip_copy_literal[id=main:@literal:37] -> float_type, {1024, 256, 1, 1}, {256, 1, 1, 1}, target_id=0
@150 = hip::hip_copy_literal[id=main:@literal:78] -> float_type, {256, 1024, 1, 1}, {1024, 1, 1, 1}, target_id=0
@151 = hip::hip_copy_literal[id=main:@literal:30] -> float_type, {1024, 256, 1, 1}, {256, 1, 1, 1}, target_id=0
@152 = hip::hip_copy_literal[id=main:@literal:63] -> float_type, {256, 1, 1}, {1, 1, 1}, target_id=0
@153 = hip::hip_copy_literal[id=main:@literal:14] -> float_type, {1024, 1, 1}, {1, 1, 1}, target_id=0
@154 = hip::hip_copy_literal[id=main:@literal:56] -> float_type, {256, 1, 1}, {1, 1, 1}, target_id=0
@155 = hip::hip_copy_literal[id=main:@literal:42] -> float_type, {256, 256, 3, 3}, {2304, 9, 3, 1}, target_id=0
@156 = load[offset=1204224,end=1404928](@1) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0
@157 = multibroadcast[out_lens={1, 256, 14, 14},out_dyn_dims={}](@144) -> float_type, {1, 256, 14, 14}, {0, 1, 0, 0}, target_id=0
@158 = gpu::code_object[code_object=8272,symbol_name=mlir_convolution_add_relu,global=3584,local=64,](@157,@137,@147,@156) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0
@159 = multibroadcast[out_lens={1, 256, 14, 14},out_dyn_dims={}](@154) -> float_type, {1, 256, 14, 14}, {0, 1, 0, 0}, target_id=0
@160 = load[offset=1003520,end=1204224](@1) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0
@161 = gpu::code_object[code_object=7888,symbol_name=mlir_convolution_add_relu,global=7168,local=128,](@159,@158,@155,@160) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0
@162 = multibroadcast[out_lens={1, 1024, 14, 14},out_dyn_dims={}](@140) -> float_type, {1, 1024, 14, 14}, {0, 1, 0, 0}, target_id=0
@163 = load[offset=2207744,end=3010560](@1) -> float_type, {1, 1024, 14, 14}, {200704, 196, 14, 1}, target_id=0
@164 = gpu::code_object[code_object=9176,symbol_name=mlir_convolution_add_add_relu,global=7168,local=64,](@162,@137,@161,@149,@163) -> float_type, {1, 1024, 14, 14}, {200704, 196, 14, 1}, target_id=0
@165 = load[offset=200704,end=401408](@1) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0
@166 = multibroadcast[out_lens={1, 256, 14, 14},out_dyn_dims={}](@143) -> float_type, {1, 256, 14, 14}, {0, 1, 0, 0}, target_id=0
@167 = gpu::code_object[code_object=8272,symbol_name=mlir_convolution_add_relu,global=3584,local=64,](@166,@164,@150,@165) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0
@168 = multibroadcast[out_lens={1, 256, 14, 14},out_dyn_dims={}](@152) -> float_type, {1, 256, 14, 14}, {0, 1, 0, 0}, target_id=0
@169 = load[offset=0,end=200704](@1) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0
@170 = gpu::code_object[code_object=7888,symbol_name=mlir_convolution_add_relu,global=7168,local=128,](@168,@167,@148,@169) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0
@171 = multibroadcast[out_lens={1, 1024, 14, 14},out_dyn_dims={}](@142) -> float_type, {1, 1024, 14, 14}, {0, 1, 0, 0}, target_id=0
@172 = load[offset=1404928,end=2207744](@1) -> float_type, {1, 1024, 14, 14}, {200704, 196, 14, 1}, target_id=0
@173 = gpu::code_object[code_object=9176,symbol_name=mlir_convolution_add_add_relu,global=7168,local=64,](@171,@164,@170,@151,@172) -> float_type, {1, 1024, 14, 14}, {200704, 196, 14, 1}, target_id=0
@174 = load[offset=401408,end=602112](@1) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0
@175 = multibroadcast[out_lens={1, 256, 14, 14},out_dyn_dims={}](@141) -> float_type, {1, 256, 14, 14}, {0, 1, 0, 0}, target_id=0
@176 = gpu::code_object[code_object=8272,symbol_name=mlir_convolution_add_relu,global=3584,local=64,](@175,@173,@145,@174) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0
@177 = multibroadcast[out_lens={1, 256, 14, 14},out_dyn_dims={}](@138) -> float_type, {1, 256, 14, 14}, {0, 1, 0, 0}, target_id=0
@178 = load[offset=200704,end=401408](@1) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0
@179 = gpu::code_object[code_object=7888,symbol_name=mlir_convolution_add_relu,global=7168,local=128,](@177,@176,@139,@178) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0
@180 = multibroadcast[out_lens={1, 1024, 14, 14},out_dyn_dims={}](@153) -> float_type, {1, 1024, 14, 14}, {0, 1, 0, 0}, target_id=0
@181 = load[offset=602112,end=1404928](@1) -> float_type, {1, 1024, 14, 14}, {200704, 196, 14, 1}, target_id=0
@182 = gpu::code_object[code_object=9176,symbol_name=mlir_convolution_add_add_relu,global=7168,local=64,](@180,@173,@179,@146,@181) -> float_type, {1, 1024, 14, 14}, {200704, 196, 14, 1}, target_id=0
@183 = hip::hip_copy_literal[id=main:@literal:10] -> float_type, {1024, 1, 1}, {1, 1, 1}, target_id=0
@184 = hip::hip_copy_literal[id=main:@literal:76] -> float_type, {256, 1, 1}, {1, 1, 1}, target_id=0
@185 = hip::hip_copy_literal[id=main:@literal:87] -> float_type, {256, 1024, 1, 1}, {1024, 1, 1, 1}, target_id=0
@186 = hip::hip_copy_literal[id=main:@literal:13] -> float_type, {256, 1, 1}, {1, 1, 1}, target_id=0
@187 = hip::hip_copy_literal[id=main:@literal:12] -> float_type, {1024, 256, 1, 1}, {256, 1, 1, 1}, target_id=0
@188 = hip::hip_copy_literal[id=main:@literal:21] -> float_type, {256, 256, 3, 3}, {2304, 9, 3, 1}, target_id=0
@189 = load[offset=401408,end=602112](@1) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0
@190 = multibroadcast[out_lens={1, 256, 14, 14},out_dyn_dims={}](@186) -> float_type, {1, 256, 14, 14}, {0, 1, 0, 0}, target_id=0
@191 = gpu::code_object[code_object=8272,symbol_name=mlir_convolution_add_relu,global=3584,local=64,](@190,@182,@185,@189) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0
@192 = multibroadcast[out_lens={1, 256, 14, 14},out_dyn_dims={}](@184) -> float_type, {1, 256, 14, 14}, {0, 1, 0, 0}, target_id=0
@193 = load[offset=200704,end=401408](@1) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0
@194 = gpu::code_object[code_object=7888,symbol_name=mlir_convolution_add_relu,global=7168,local=128,](@192,@191,@188,@193) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0
@195 = multibroadcast[out_lens={1, 1024, 14, 14},out_dyn_dims={}](@183) -> float_type, {1, 1024, 14, 14}, {0, 1, 0, 0}, target_id=0
@196 = load[offset=1806336,end=2609152](@1) -> float_type, {1, 1024, 14, 14}, {200704, 196, 14, 1}, target_id=0
@197 = gpu::code_object[code_object=9176,symbol_name=mlir_convolution_add_add_relu,global=7168,local=64,](@195,@182,@194,@187,@196) -> float_type, {1, 1024, 14, 14}, {200704, 196, 14, 1}, target_id=0
@198 = hip::hip_copy_literal[id=main:@literal:9] -> float_type, {256, 1024, 1, 1}, {1024, 1, 1, 1}, target_id=0
@199 = hip::hip_copy_literal[id=main:@literal:48] -> float_type, {2048, 1536, 1, 1}, {1536, 1, 1, 1}, target_id=0
@200 = hip::hip_copy_literal[id=main:@literal:2] -> float_type, {512, 1, 1}, {1, 1, 1}, target_id=0
@201 = hip::hip_copy_literal[id=main:@literal:90] -> float_type, {512, 1, 1}, {1, 1, 1}, target_id=0
@202 = hip::hip_copy_literal[id=main:@literal:3] -> float_type, {1024, 1, 1}, {1, 1, 1}, target_id=0
@203 = hip::hip_copy_literal[id=main:@literal:91] -> float_type, {2048, 1, 1}, {1, 1, 1}, target_id=0
@204 = hip::hip_copy_literal[id=main:@literal:5] -> float_type, {256, 256, 3, 3}, {2304, 9, 3, 1}, target_id=0
@205 = hip::hip_copy_literal[id=main:@literal:29] -> float_type, {256, 1, 1}, {1, 1, 1}, target_id=0
@206 = hip::hip_copy_literal[id=main:@literal:55] -> float_type, {512, 1024, 1, 1}, {1024, 1, 1, 1}, target_id=0
@207 = hip::hip_copy_literal[id=main:@literal:34] -> float_type, {1024, 256, 1, 1}, {256, 1, 1, 1}, target_id=0
@208 = hip::hip_copy_literal[id=main:@literal:8] -> float_type, {256, 1, 1}, {1, 1, 1}, target_id=0
@209 = hip::hip_copy_literal[id=main:@literal:73] -> float_type, {512, 512, 3, 3}, {4608, 9, 3, 1}, target_id=0
@210 = load[offset=200704,end=401408](@1) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0
@211 = multibroadcast[out_lens={1, 256, 14, 14},out_dyn_dims={}](@208) -> float_type, {1, 256, 14, 14}, {0, 1, 0, 0}, target_id=0
@212 = gpu::code_object[code_object=8272,symbol_name=mlir_convolution_add_relu,global=3584,local=64,](@211,@197,@198,@210) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0
@213 = multibroadcast[out_lens={1, 256, 14, 14},out_dyn_dims={}](@205) -> float_type, {1, 256, 14, 14}, {0, 1, 0, 0}, target_id=0
@214 = load[offset=0,end=200704](@1) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0
@215 = gpu::code_object[code_object=7888,symbol_name=mlir_convolution_add_relu,global=7168,local=128,](@213,@212,@204,@214) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0
@216 = multibroadcast[out_lens={1, 1024, 14, 14},out_dyn_dims={}](@202) -> float_type, {1, 1024, 14, 14}, {0, 1, 0, 0}, target_id=0
@217 = load[offset=1003520,end=1806336](@1) -> float_type, {1, 1024, 14, 14}, {200704, 196, 14, 1}, target_id=0
@218 = gpu::code_object[code_object=9176,symbol_name=mlir_convolution_add_add_relu,global=7168,local=64,](@216,@197,@215,@207,@217) -> float_type, {1, 1024, 14, 14}, {200704, 196, 14, 1}, target_id=0
@219 = load[offset=501760,end=903168](@1) -> float_type, {1, 512, 14, 14}, {100352, 196, 14, 1}, target_id=0
@220 = multibroadcast[out_lens={1, 512, 14, 14},out_dyn_dims={}](@200) -> float_type, {1, 512, 14, 14}, {0, 1, 0, 0}, target_id=0
@221 = gpu::code_object[code_object=8784,symbol_name=mlir_convolution_add_relu,global=3584,local=64,](@220,@218,@206,@219) -> float_type, {1, 512, 14, 14}, {100352, 196, 14, 1}, target_id=0
@222 = multibroadcast[out_lens={1, 512, 7, 7},out_dyn_dims={}](@201) -> float_type, {1, 512, 7, 7}, {0, 1, 0, 0}, target_id=0
@223 = load[offset=401408,end=501760](@1) -> float_type, {1, 512, 7, 7}, {25088, 49, 7, 1}, target_id=0
@224 = gpu::code_object[code_object=8400,symbol_name=mlir_convolution_add_relu,global=4096,local=128,](@222,@221,@209,@223) -> float_type, {1, 512, 7, 7}, {25088, 49, 7, 1}, target_id=0
@225 = step[axes={2, 3},steps={2, 2}](@218) -> float_type, {1, 1024, 7, 7}, {200704, 196, 28, 2}, target_id=0
@226 = load[offset=0,end=301056](@1) -> float_type, {1, 1536, 7, 7}, {75264, 49, 7, 1}, target_id=0
@227 = gpu::code_object[code_object=4792,symbol_name=concat_kernel,global=37632,local=1024,](@224,@225,@226) -> float_type, {1, 1536, 7, 7}, {75264, 49, 7, 1}, target_id=0
@228 = multibroadcast[out_lens={1, 2048, 7, 7},out_dyn_dims={}](@203) -> float_type, {1, 2048, 7, 7}, {0, 1, 0, 0}, target_id=0
@229 = load[offset=903168,end=1304576](@1) -> float_type, {1, 2048, 7, 7}, {100352, 49, 7, 1}, target_id=0
@230 = gpu::code_object[code_object=14160,symbol_name=mlir_convolution_add_relu,global=2048,local=64,](@228,@227,@199,@229) -> float_type, {1, 2048, 7, 7}, {100352, 49, 7, 1}, target_id=0
@231 = hip::hip_copy_literal[id=main:@literal:66] -> float_type, {2048, 512, 1, 1}, {512, 1, 1, 1}, target_id=0
@232 = hip::hip_copy_literal[id=main:@literal:99] -> float_type, {2048, 1, 1}, {1, 1, 1}, target_id=0
@233 = hip::hip_copy_literal[id=main:@literal:98] -> float_type, {512, 1, 1}, {1, 1, 1}, target_id=0
@234 = hip::hip_copy_literal[id=main:@literal:93] -> float_type, {512, 1, 1}, {1, 1, 1}, target_id=0
@235 = hip::hip_copy_literal[id=main:@literal:92] -> float_type, {512, 2048, 1, 1}, {2048, 1, 1, 1}, target_id=0
@236 = hip::hip_copy_literal[id=main:@literal:94] -> float_type, {512, 512, 3, 3}, {4608, 9, 3, 1}, target_id=0
@237 = hip::hip_copy_literal[id=main:@literal:85] -> float_type, {512, 2048, 1, 1}, {2048, 1, 1, 1}, target_id=0
@238 = hip::hip_copy_literal[id=main:@literal:44] -> float_type, {512, 1, 1}, {1, 1, 1}, target_id=0
@239 = hip::hip_copy_literal[id=main:@literal:41] -> float_type, {2048, 512, 1, 1}, {512, 1, 1, 1}, target_id=0
@240 = hip::hip_copy_literal[id=main:@literal:38] -> float_type, {1000}, {1}, target_id=0
@241 = hip::hip_copy_literal[id=main:@literal:0] -> float_type, {2048, 1000}, {1000, 1}, target_id=0
@242 = hip::hip_copy_literal[id=main:@literal:97] -> float_type, {512, 512, 3, 3}, {4608, 9, 3, 1}, target_id=0
@243 = hip::hip_copy_literal[id=main:@literal:96] -> float_type, {512, 1, 1}, {1, 1, 1}, target_id=0
@244 = hip::hip_copy_literal[id=main:@literal:95] -> float_type, {2048, 1, 1}, {1, 1, 1}, target_id=0
@245 = multibroadcast[out_lens={1, 512, 7, 7},out_dyn_dims={}](@234) -> float_type, {1, 512, 7, 7}, {0, 1, 0, 0}, target_id=0
@246 = load[offset=100352,end=200704](@1) -> float_type, {1, 512, 7, 7}, {25088, 49, 7, 1}, target_id=0
@247 = gpu::code_object[code_object=9040,symbol_name=mlir_convolution_add_relu,global=2048,local=64,](@245,@230,@235,@246) -> float_type, {1, 512, 7, 7}, {25088, 49, 7, 1}, target_id=0
@248 = multibroadcast[out_lens={1, 512, 7, 7},out_dyn_dims={}](@238) -> float_type, {1, 512, 7, 7}, {0, 1, 0, 0}, target_id=0
@249 = load[offset=0,end=100352](@1) -> float_type, {1, 512, 7, 7}, {25088, 49, 7, 1}, target_id=0
@250 = gpu::code_object[code_object=8272,symbol_name=mlir_convolution_add_relu,global=4096,local=128,](@248,@247,@236,@249) -> float_type, {1, 512, 7, 7}, {25088, 49, 7, 1}, target_id=0
@251 = multibroadcast[out_lens={1, 2048, 7, 7},out_dyn_dims={}](@244) -> float_type, {1, 2048, 7, 7}, {0, 1, 0, 0}, target_id=0
@252 = load[offset=501760,end=903168](@1) -> float_type, {1, 2048, 7, 7}, {100352, 49, 7, 1}, target_id=0
@253 = gpu::code_object[code_object=9944,symbol_name=mlir_convolution_add_add_relu,global=4096,local=64,](@251,@230,@250,@239,@252) -> float_type, {1, 2048, 7, 7}, {100352, 49, 7, 1}, target_id=0
@254 = load[offset=100352,end=200704](@1) -> float_type, {1, 512, 7, 7}, {25088, 49, 7, 1}, target_id=0
@255 = multibroadcast[out_lens={1, 512, 7, 7},out_dyn_dims={}](@243) -> float_type, {1, 512, 7, 7}, {0, 1, 0, 0}, target_id=0
@256 = gpu::code_object[code_object=9040,symbol_name=mlir_convolution_add_relu,global=2048,local=64,](@255,@253,@237,@254) -> float_type, {1, 512, 7, 7}, {25088, 49, 7, 1}, target_id=0
@257 = multibroadcast[out_lens={1, 512, 7, 7},out_dyn_dims={}](@233) -> float_type, {1, 512, 7, 7}, {0, 1, 0, 0}, target_id=0
@258 = load[offset=0,end=100352](@1) -> float_type, {1, 512, 7, 7}, {25088, 49, 7, 1}, target_id=0
@259 = gpu::code_object[code_object=8272,symbol_name=mlir_convolution_add_relu,global=4096,local=128,](@257,@256,@242,@258) -> float_type, {1, 512, 7, 7}, {25088, 49, 7, 1}, target_id=0
@260 = load[offset=903168,end=1304576](@1) -> float_type, {1, 2048, 7, 7}, {100352, 49, 7, 1}, target_id=0
@261 = gpu::code_object[code_object=8120,symbol_name=mlir_convolution,global=4096,local=64,](@259,@231,@260) -> float_type, {1, 2048, 7, 7}, {100352, 49, 7, 1}, target_id=0
@262 = multibroadcast[out_lens={1, 2048, 7, 7},out_dyn_dims={}](@232) -> float_type, {1, 2048, 7, 7}, {0, 1, 0, 0}, target_id=0
@263 = load[offset=0,end=8192](@1) -> float_type, {1, 2048, 1, 1}, {2048, 1, 1, 1}, target_id=0
@264 = gpu::code_object[code_object=4568,symbol_name=add_add_relu_reduce_mean_kernel,global=131072,local=64,](@261,@262,@253,@263) -> float_type, {1, 2048, 1, 1}, {2048, 1, 1, 1}, target_id=0
@265 = multibroadcast[out_lens={1, 1000},out_dyn_dims={}](@240) -> float_type, {1, 1000}, {0, 1}, target_id=0
main:#output_0 = @param:main:#output_0 -> float_type, {1, 1000}, {1000, 1}, target_id=0
@267 = gpu::code_object[code_object=5440,symbol_name=mlir_reshape_dot_add,global=2048,local=64,](@265,@264,@241,main:#output_0) -> float_type, {1, 1000}, {1000, 1}, target_id=0
@268 = @return(@267), target_id=0

Allocating params ... 
Running performance report ... 
@0 = check_context::migraphx::gpu::context -> float_type, {}, {}, target_id=0: 0.00065132ms, 1%
@1 = hip::hip_allocate_memory[shape=int8_type, {9633792}, {1},id=main:scratch] -> int8_type, {9633792}, {1}, target_id=0: 0.00051378ms, 1%
@2 = hip::hip_copy_literal[id=main:@literal:17] -> float_type, {64, 3, 7, 7}, {147, 49, 7, 1}, target_id=0: 0.00055986ms, 1%
@3 = hip::hip_copy_literal[id=main:@literal:50] -> float_type, {64, 1, 1}, {1, 1, 1}, target_id=0: 0.00036904ms, 1%
@4 = load[offset=3211264,end=6422528](@1) -> float_type, {1, 64, 112, 112}, {802816, 12544, 112, 1}, target_id=0: 0.00051458ms, 1%
@5 = multibroadcast[out_lens={1, 64, 112, 112},out_dyn_dims={}](@3) -> float_type, {1, 64, 112, 112}, {0, 1, 0, 0}, target_id=0: 0.00065752ms, 1%
0 = @param:0 -> float_type, {1, 3, 224, 224}, {150528, 50176, 224, 1}, target_id=0: 0.00030916ms, 1%
@7 = gpu::code_object[code_object=9168,symbol_name=mlir_convolution_add_relu,global=12544,local=64,](@5,0,@2,@4) -> float_type, {1, 64, 112, 112}, {802816, 12544, 112, 1}, target_id=0: 0.0549274ms, 2%
@8 = load[offset=2408448,end=3211264](@1) -> float_type, {1, 64, 56, 56}, {200704, 3136, 56, 1}, target_id=0: 0.00049898ms, 1%
@9 = gpu::pooling[mode=max,padding={1, 1, 1, 1},padding_mode=0,stride={2, 2},lengths={3, 3},dilations={1, 1},ceil_mode=0,lp_order=2,dyn_global=0](@7,@8) -> float_type, {1, 64, 56, 56}, {200704, 3136, 56, 1}, target_id=0: 0.0425269ms, 2%
@10 = hip::hip_copy_literal[id=main:@literal:62] -> float_type, {64, 64, 1, 1}, {64, 1, 1, 1}, target_id=0: 0.00067024ms, 1%
@11 = hip::hip_copy_literal[id=main:@literal:33] -> float_type, {64, 1, 1}, {1, 1, 1}, target_id=0: 0.00036672ms, 1%
@12 = load[offset=802816,end=1605632](@1) -> float_type, {1, 64, 56, 56}, {200704, 3136, 56, 1}, target_id=0: 0.00054742ms, 1%
@13 = multibroadcast[out_lens={1, 64, 56, 56},out_dyn_dims={}](@11) -> float_type, {1, 64, 56, 56}, {0, 1, 0, 0}, target_id=0: 0.00076838ms, 1%
@14 = gpu::code_object[code_object=6096,symbol_name=mlir_convolution_add_relu,global=6272,local=64,](@13,@9,@10,@12) -> float_type, {1, 64, 56, 56}, {200704, 3136, 56, 1}, target_id=0: 0.0279587ms, 1%
@15 = hip::hip_copy_literal[id=main:@literal:60] -> float_type, {64, 64, 3, 3}, {576, 9, 3, 1}, target_id=0: 0.00048092ms, 1%
@16 = hip::hip_copy_literal[id=main:@literal:65] -> float_type, {64, 1, 1}, {1, 1, 1}, target_id=0: 0.00037894ms, 1%
@17 = multibroadcast[out_lens={1, 64, 56, 56},out_dyn_dims={}](@16) -> float_type, {1, 64, 56, 56}, {0, 1, 0, 0}, target_id=0: 0.00065842ms, 1%
@18 = load[offset=0,end=802816](@1) -> float_type, {1, 64, 56, 56}, {200704, 3136, 56, 1}, target_id=0: 0.00045704ms, 1%
@19 = gpu::code_object[code_object=8144,symbol_name=mlir_convolution_add_relu,global=6272,local=128,](@17,@14,@15,@18) -> float_type, {1, 64, 56, 56}, {200704, 3136, 56, 1}, target_id=0: 0.0558722ms, 2%
@20 = hip::hip_copy_literal[id=main:@literal:61] -> float_type, {256, 128, 1, 1}, {128, 1, 1, 1}, target_id=0: 0.00046206ms, 1%
@21 = load[offset=3211264,end=4816896](@1) -> float_type, {1, 128, 56, 56}, {401408, 3136, 56, 1}, target_id=0: 0.00045212ms, 1%
@22 = gpu::code_object[code_object=4280,symbol_name=concat_kernel,global=50176,local=1024,](@19,@9,@21) -> float_type, {1, 128, 56, 56}, {401408, 3136, 56, 1}, target_id=0: 0.0242199ms, 1%
@23 = hip::hip_copy_literal[id=main:@literal:69] -> float_type, {256, 1, 1}, {1, 1, 1}, target_id=0: 0.00045708ms, 1%
@24 = load[offset=6422528,end=9633792](@1) -> float_type, {1, 256, 56, 56}, {802816, 3136, 56, 1}, target_id=0: 0.00045686ms, 1%
@25 = multibroadcast[out_lens={1, 256, 56, 56},out_dyn_dims={}](@23) -> float_type, {1, 256, 56, 56}, {0, 1, 0, 0}, target_id=0: 0.00073004ms, 1%
@26 = gpu::code_object[code_object=13392,symbol_name=mlir_convolution_add_relu,global=6400,local=64,](@25,@22,@20,@24) -> float_type, {1, 256, 56, 56}, {802816, 3136, 56, 1}, target_id=0: 0.0459301ms, 2%
@27 = hip::hip_copy_literal[id=main:@literal:54] -> float_type, {64, 1, 1}, {1, 1, 1}, target_id=0: 0.00047168ms, 1%
@28 = hip::hip_copy_literal[id=main:@literal:59] -> float_type, {64, 256, 1, 1}, {256, 1, 1, 1}, target_id=0: 0.00035732ms, 1%
@29 = hip::hip_copy_literal[id=main:@literal:11] -> float_type, {64, 64, 3, 3}, {576, 9, 3, 1}, target_id=0: 0.0004366ms, 1%
@30 = hip::hip_copy_literal[id=main:@literal:83] -> float_type, {64, 1, 1}, {1, 1, 1}, target_id=0: 0.00038148ms, 1%
@31 = multibroadcast[out_lens={1, 64, 56, 56},out_dyn_dims={}](@30) -> float_type, {1, 64, 56, 56}, {0, 1, 0, 0}, target_id=0: 0.00063758ms, 1%
@32 = load[offset=802816,end=1605632](@1) -> float_type, {1, 64, 56, 56}, {200704, 3136, 56, 1}, target_id=0: 0.00045042ms, 1%
@33 = gpu::code_object[code_object=7504,symbol_name=mlir_convolution_add_relu,global=6272,local=128,](@31,@26,@28,@32) -> float_type, {1, 64, 56, 56}, {200704, 3136, 56, 1}, target_id=0: 0.0386579ms, 2%
@34 = multibroadcast[out_lens={1, 64, 56, 56},out_dyn_dims={}](@27) -> float_type, {1, 64, 56, 56}, {0, 1, 0, 0}, target_id=0: 0.00061378ms, 1%
@35 = load[offset=0,end=802816](@1) -> float_type, {1, 64, 56, 56}, {200704, 3136, 56, 1}, target_id=0: 0.00043988ms, 1%
@36 = gpu::code_object[code_object=8144,symbol_name=mlir_convolution_add_relu,global=6272,local=128,](@34,@33,@29,@35) -> float_type, {1, 64, 56, 56}, {200704, 3136, 56, 1}, target_id=0: 0.0558045ms, 2%
@37 = hip::hip_copy_literal[id=main:@literal:47] -> float_type, {256, 1, 1}, {1, 1, 1}, target_id=0: 0.00047792ms, 1%
@38 = hip::hip_copy_literal[id=main:@literal:49] -> float_type, {256, 64, 1, 1}, {64, 1, 1, 1}, target_id=0: 0.00037762ms, 1%
@39 = multibroadcast[out_lens={1, 256, 56, 56},out_dyn_dims={}](@37) -> float_type, {1, 256, 56, 56}, {0, 1, 0, 0}, target_id=0: 0.00063654ms, 1%
@40 = load[offset=3211264,end=6422528](@1) -> float_type, {1, 256, 56, 56}, {802816, 3136, 56, 1}, target_id=0: 0.00044322ms, 1%
@41 = gpu::code_object[code_object=11096,symbol_name=mlir_convolution_add_add_relu,global=12544,local=128,](@39,@26,@36,@38,@40) -> float_type, {1, 256, 56, 56}, {802816, 3136, 56, 1}, target_id=0: 0.0387424ms, 2%
@42 = hip::hip_copy_literal[id=main:@literal:27] -> float_type, {64, 1, 1}, {1, 1, 1}, target_id=0: 0.00047274ms, 1%
@43 = hip::hip_copy_literal[id=main:@literal:43] -> float_type, {64, 256, 1, 1}, {256, 1, 1, 1}, target_id=0: 0.00038714ms, 1%
@44 = multibroadcast[out_lens={1, 64, 56, 56},out_dyn_dims={}](@42) -> float_type, {1, 64, 56, 56}, {0, 1, 0, 0}, target_id=0: 0.00063316ms, 1%
@45 = load[offset=0,end=802816](@1) -> float_type, {1, 64, 56, 56}, {200704, 3136, 56, 1}, target_id=0: 0.00043834ms, 1%
@46 = gpu::code_object[code_object=7504,symbol_name=mlir_convolution_add_relu,global=6272,local=128,](@44,@41,@43,@45) -> float_type, {1, 64, 56, 56}, {200704, 3136, 56, 1}, target_id=0: 0.0386086ms, 2%
@47 = hip::hip_copy_literal[id=main:@literal:70] -> float_type, {256, 64, 1, 1}, {64, 1, 1, 1}, target_id=0: 0.00048324ms, 1%
@48 = hip::hip_copy_literal[id=main:@literal:53] -> float_type, {64, 64, 3, 3}, {576, 9, 3, 1}, target_id=0: 0.0004093ms, 1%
@49 = hip::hip_copy_literal[id=main:@literal:39] -> float_type, {64, 1, 1}, {1, 1, 1}, target_id=0: 0.0004225ms, 1%
@50 = multibroadcast[out_lens={1, 64, 56, 56},out_dyn_dims={}](@49) -> float_type, {1, 64, 56, 56}, {0, 1, 0, 0}, target_id=0: 0.00064042ms, 1%
@51 = load[offset=6422528,end=7225344](@1) -> float_type, {1, 64, 56, 56}, {200704, 3136, 56, 1}, target_id=0: 0.00043822ms, 1%
@52 = gpu::code_object[code_object=8144,symbol_name=mlir_convolution_add_relu,global=6272,local=128,](@50,@46,@48,@51) -> float_type, {1, 64, 56, 56}, {200704, 3136, 56, 1}, target_id=0: 0.0557729ms, 2%
@53 = hip::hip_copy_literal[id=main:@literal:74] -> float_type, {256, 1, 1}, {1, 1, 1}, target_id=0: 0.00049314ms, 1%
@54 = multibroadcast[out_lens={1, 256, 56, 56},out_dyn_dims={}](@53) -> float_type, {1, 256, 56, 56}, {0, 1, 0, 0}, target_id=0: 0.00064784ms, 1%
@55 = load[offset=0,end=3211264](@1) -> float_type, {1, 256, 56, 56}, {802816, 3136, 56, 1}, target_id=0: 0.00043738ms, 1%
@56 = gpu::code_object[code_object=11096,symbol_name=mlir_convolution_add_add_relu,global=12544,local=128,](@54,@41,@52,@47,@55) -> float_type, {1, 256, 56, 56}, {802816, 3136, 56, 1}, target_id=0: 0.0378285ms, 2%
@57 = hip::hip_copy_literal[id=main:@literal:58] -> float_type, {128, 1, 1}, {1, 1, 1}, target_id=0: 0.00048712ms, 1%
@58 = hip::hip_copy_literal[id=main:@literal:77] -> float_type, {128, 256, 1, 1}, {256, 1, 1, 1}, target_id=0: 0.00034254ms, 1%
@59 = hip::hip_copy_literal[id=main:@literal:72] -> float_type, {128, 1, 1}, {1, 1, 1}, target_id=0: 0.0003965ms, 1%
@60 = multibroadcast[out_lens={1, 128, 56, 56},out_dyn_dims={}](@59) -> float_type, {1, 128, 56, 56}, {0, 1, 0, 0}, target_id=0: 0.00062362ms, 1%
@61 = load[offset=4816896,end=6422528](@1) -> float_type, {1, 128, 56, 56}, {401408, 3136, 56, 1}, target_id=0: 0.00044594ms, 1%
@62 = gpu::code_object[code_object=8016,symbol_name=mlir_convolution_add_relu,global=6272,local=64,](@60,@56,@58,@61) -> float_type, {1, 128, 56, 56}, {401408, 3136, 56, 1}, target_id=0: 0.0441917ms, 2%
@63 = hip::hip_copy_literal[id=main:@literal:79] -> float_type, {128, 128, 3, 3}, {1152, 9, 3, 1}, target_id=0: 0.00049374ms, 1%
@64 = multibroadcast[out_lens={1, 128, 28, 28},out_dyn_dims={}](@57) -> float_type, {1, 128, 28, 28}, {0, 1, 0, 0}, target_id=0: 0.0006264ms, 1%
@65 = load[offset=4415488,end=4816896](@1) -> float_type, {1, 128, 28, 28}, {100352, 784, 28, 1}, target_id=0: 0.0004345ms, 1%
@66 = gpu::code_object[code_object=9552,symbol_name=mlir_convolution_add_relu,global=6656,local=256,](@64,@62,@63,@65) -> float_type, {1, 128, 28, 28}, {100352, 784, 28, 1}, target_id=0: 0.0715547ms, 2%
@67 = hip::hip_copy_literal[id=main:@literal:45] -> float_type, {512, 1, 1}, {1, 1, 1}, target_id=0: 0.00047294ms, 1%
@68 = hip::hip_copy_literal[id=main:@literal:81] -> float_type, {512, 384, 1, 1}, {384, 1, 1, 1}, target_id=0: 0.000355ms, 1%
@69 = load[offset=3211264,end=4415488](@1) -> float_type, {1, 384, 28, 28}, {301056, 784, 28, 1}, target_id=0: 0.0004441ms, 1%
@70 = step[axes={2, 3},steps={2, 2}](@56) -> float_type, {1, 256, 28, 28}, {802816, 3136, 112, 2}, target_id=0: 0.0008525ms, 1%
@71 = gpu::code_object[code_object=4536,symbol_name=concat_kernel,global=150528,local=1024,](@66,@70,@69) -> float_type, {1, 384, 28, 28}, {301056, 784, 28, 1}, target_id=0: 0.0247355ms, 1%
@72 = multibroadcast[out_lens={1, 512, 28, 28},out_dyn_dims={}](@67) -> float_type, {1, 512, 28, 28}, {0, 1, 0, 0}, target_id=0: 0.00070908ms, 1%
@73 = load[offset=4415488,end=6021120](@1) -> float_type, {1, 512, 28, 28}, {401408, 784, 28, 1}, target_id=0: 0.00043226ms, 1%
@74 = gpu::code_object[code_object=11216,symbol_name=mlir_convolution_add_relu,global=6656,local=128,](@72,@71,@68,@73) -> float_type, {1, 512, 28, 28}, {401408, 784, 28, 1}, target_id=0: 0.0536864ms, 2%
@75 = hip::hip_copy_literal[id=main:@literal:25] -> float_type, {128, 128, 3, 3}, {1152, 9, 3, 1}, target_id=0: 0.00046688ms, 1%
@76 = hip::hip_copy_literal[id=main:@literal:80] -> float_type, {128, 512, 1, 1}, {512, 1, 1, 1}, target_id=0: 0.00037106ms, 1%
@77 = hip::hip_copy_literal[id=main:@literal:64] -> float_type, {128, 1, 1}, {1, 1, 1}, target_id=0: 0.00038734ms, 1%
@78 = load[offset=802816,end=1204224](@1) -> float_type, {1, 128, 28, 28}, {100352, 784, 28, 1}, target_id=0: 0.00044222ms, 1%
@79 = multibroadcast[out_lens={1, 128, 28, 28},out_dyn_dims={}](@77) -> float_type, {1, 128, 28, 28}, {0, 1, 0, 0}, target_id=0: 0.00067546ms, 1%
@80 = gpu::code_object[code_object=7504,symbol_name=mlir_convolution_add_relu,global=6656,local=256,](@79,@74,@76,@78) -> float_type, {1, 128, 28, 28}, {100352, 784, 28, 1}, target_id=0: 0.043799ms, 2%
@81 = hip::hip_copy_literal[id=main:@literal:84] -> float_type, {128, 1, 1}, {1, 1, 1}, target_id=0: 0.0004671ms, 1%
@82 = multibroadcast[out_lens={1, 128, 28, 28},out_dyn_dims={}](@81) -> float_type, {1, 128, 28, 28}, {0, 1, 0, 0}, target_id=0: 0.00063164ms, 1%
@83 = load[offset=401408,end=802816](@1) -> float_type, {1, 128, 28, 28}, {100352, 784, 28, 1}, target_id=0: 0.00044166ms, 1%
@84 = gpu::code_object[code_object=9552,symbol_name=mlir_convolution_add_relu,global=6656,local=256,](@82,@80,@75,@83) -> float_type, {1, 128, 28, 28}, {100352, 784, 28, 1}, target_id=0: 0.0705405ms, 2%
@85 = hip::hip_copy_literal[id=main:@literal:36] -> float_type, {512, 1, 1}, {1, 1, 1}, target_id=0: 0.00046698ms, 1%
@86 = hip::hip_copy_literal[id=main:@literal:86] -> float_type, {512, 128, 1, 1}, {128, 1, 1, 1}, target_id=0: 0.00036238ms, 1%
@87 = multibroadcast[out_lens={1, 512, 28, 28},out_dyn_dims={}](@85) -> float_type, {1, 512, 28, 28}, {0, 1, 0, 0}, target_id=0: 0.00064842ms, 1%
@88 = load[offset=1204224,end=2809856](@1) -> float_type, {1, 512, 28, 28}, {401408, 784, 28, 1}, target_id=0: 0.0004324ms, 1%
@89 = gpu::code_object[code_object=9176,symbol_name=mlir_convolution_add_add_relu,global=6656,local=128,](@87,@74,@84,@86,@88) -> float_type, {1, 512, 28, 28}, {401408, 784, 28, 1}, target_id=0: 0.0376923ms, 2%
@90 = hip::hip_copy_literal[id=main:@literal:40] -> float_type, {128, 1, 1}, {1, 1, 1}, target_id=0: 0.00046088ms, 1%
@91 = hip::hip_copy_literal[id=main:@literal:57] -> float_type, {128, 1, 1}, {1, 1, 1}, target_id=0: 0.00034246ms, 1%
@92 = hip::hip_copy_literal[id=main:@literal:1] -> float_type, {128, 512, 1, 1}, {512, 1, 1, 1}, target_id=0: 0.0003652ms, 1%
@93 = multibroadcast[out_lens={1, 128, 28, 28},out_dyn_dims={}](@91) -> float_type, {1, 128, 28, 28}, {0, 1, 0, 0}, target_id=0: 0.00070668ms, 1%
@94 = load[offset=802816,end=1204224](@1) -> float_type, {1, 128, 28, 28}, {100352, 784, 28, 1}, target_id=0: 0.00043904ms, 1%
@95 = gpu::code_object[code_object=7504,symbol_name=mlir_convolution_add_relu,global=6656,local=256,](@93,@89,@92,@94) -> float_type, {1, 128, 28, 28}, {100352, 784, 28, 1}, target_id=0: 0.0438044ms, 2%
@96 = hip::hip_copy_literal[id=main:@literal:6] -> float_type, {128, 128, 3, 3}, {1152, 9, 3, 1}, target_id=0: 0.0004645ms, 1%
@97 = multibroadcast[out_lens={1, 128, 28, 28},out_dyn_dims={}](@90) -> float_type, {1, 128, 28, 28}, {0, 1, 0, 0}, target_id=0: 0.00062344ms, 1%
@98 = load[offset=401408,end=802816](@1) -> float_type, {1, 128, 28, 28}, {100352, 784, 28, 1}, target_id=0: 0.00042968ms, 1%
@99 = gpu::code_object[code_object=9552,symbol_name=mlir_convolution_add_relu,global=6656,local=256,](@97,@95,@96,@98) -> float_type, {1, 128, 28, 28}, {100352, 784, 28, 1}, target_id=0: 0.0705921ms, 2%
@100 = hip::hip_copy_literal[id=main:@literal:7] -> float_type, {512, 1, 1}, {1, 1, 1}, target_id=0: 0.0004361ms, 1%
@101 = hip::hip_copy_literal[id=main:@literal:89] -> float_type, {512, 128, 1, 1}, {128, 1, 1, 1}, target_id=0: 0.00037266ms, 1%
@102 = multibroadcast[out_lens={1, 512, 28, 28},out_dyn_dims={}](@100) -> float_type, {1, 512, 28, 28}, {0, 1, 0, 0}, target_id=0: 0.0006528ms, 1%
@103 = load[offset=3813376,end=5419008](@1) -> float_type, {1, 512, 28, 28}, {401408, 784, 28, 1}, target_id=0: 0.00044326ms, 1%
@104 = gpu::code_object[code_object=9176,symbol_name=mlir_convolution_add_add_relu,global=6656,local=128,](@102,@89,@99,@101,@103) -> float_type, {1, 512, 28, 28}, {401408, 784, 28, 1}, target_id=0: 0.0375239ms, 2%
@105 = hip::hip_copy_literal[id=main:@literal:26] -> float_type, {512, 1, 1}, {1, 1, 1}, target_id=0: 0.00047254ms, 1%
@106 = hip::hip_copy_literal[id=main:@literal:35] -> float_type, {128, 512, 1, 1}, {512, 1, 1, 1}, target_id=0: 0.00042022ms, 1%
@107 = hip::hip_copy_literal[id=main:@literal:31] -> float_type, {128, 128, 3, 3}, {1152, 9, 3, 1}, target_id=0: 0.00037754ms, 1%
@108 = hip::hip_copy_literal[id=main:@literal:16] -> float_type, {128, 1, 1}, {1, 1, 1}, target_id=0: 0.00037368ms, 1%
@109 = hip::hip_copy_literal[id=main:@literal:28] -> float_type, {128, 1, 1}, {1, 1, 1}, target_id=0: 0.00038456ms, 1%
@110 = multibroadcast[out_lens={1, 128, 28, 28},out_dyn_dims={}](@108) -> float_type, {1, 128, 28, 28}, {0, 1, 0, 0}, target_id=0: 0.00061336ms, 1%
@111 = load[offset=401408,end=802816](@1) -> float_type, {1, 128, 28, 28}, {100352, 784, 28, 1}, target_id=0: 0.00044378ms, 1%
@112 = gpu::code_object[code_object=7504,symbol_name=mlir_convolution_add_relu,global=6656,local=256,](@110,@104,@106,@111) -> float_type, {1, 128, 28, 28}, {100352, 784, 28, 1}, target_id=0: 0.0435892ms, 2%
@113 = multibroadcast[out_lens={1, 128, 28, 28},out_dyn_dims={}](@109) -> float_type, {1, 128, 28, 28}, {0, 1, 0, 0}, target_id=0: 0.00064668ms, 1%
@114 = load[offset=0,end=401408](@1) -> float_type, {1, 128, 28, 28}, {100352, 784, 28, 1}, target_id=0: 0.00043702ms, 1%
@115 = gpu::code_object[code_object=9552,symbol_name=mlir_convolution_add_relu,global=6656,local=256,](@113,@112,@107,@114) -> float_type, {1, 128, 28, 28}, {100352, 784, 28, 1}, target_id=0: 0.0705356ms, 2%
@116 = hip::hip_copy_literal[id=main:@literal:51] -> float_type, {512, 128, 1, 1}, {128, 1, 1, 1}, target_id=0: 0.00050078ms, 1%
@117 = multibroadcast[out_lens={1, 512, 28, 28},out_dyn_dims={}](@105) -> float_type, {1, 512, 28, 28}, {0, 1, 0, 0}, target_id=0: 0.0006107ms, 1%
@118 = load[offset=2207744,end=3813376](@1) -> float_type, {1, 512, 28, 28}, {401408, 784, 28, 1}, target_id=0: 0.00045028ms, 1%
@119 = gpu::code_object[code_object=9176,symbol_name=mlir_convolution_add_add_relu,global=6656,local=128,](@117,@104,@115,@116,@118) -> float_type, {1, 512, 28, 28}, {401408, 784, 28, 1}, target_id=0: 0.0375256ms, 2%
@120 = hip::hip_copy_literal[id=main:@literal:23] -> float_type, {256, 1, 1}, {1, 1, 1}, target_id=0: 0.00045626ms, 1%
@121 = hip::hip_copy_literal[id=main:@literal:52] -> float_type, {256, 256, 3, 3}, {2304, 9, 3, 1}, target_id=0: 0.00034168ms, 1%
@122 = hip::hip_copy_literal[id=main:@literal:88] -> float_type, {256, 1, 1}, {1, 1, 1}, target_id=0: 0.00040118ms, 1%
@123 = hip::hip_copy_literal[id=main:@literal:24] -> float_type, {256, 512, 1, 1}, {512, 1, 1, 1}, target_id=0: 0.00036048ms, 1%
@124 = hip::hip_copy_literal[id=main:@literal:75] -> float_type, {1024, 1, 1}, {1, 1, 1}, target_id=0: 0.00040424ms, 1%
@125 = hip::hip_copy_literal[id=main:@literal:82] -> float_type, {1024, 768, 1, 1}, {768, 1, 1, 1}, target_id=0: 0.00038862ms, 1%
@126 = multibroadcast[out_lens={1, 256, 28, 28},out_dyn_dims={}](@120) -> float_type, {1, 256, 28, 28}, {0, 1, 0, 0}, target_id=0: 0.0005975ms, 1%
@127 = load[offset=0,end=802816](@1) -> float_type, {1, 256, 28, 28}, {200704, 784, 28, 1}, target_id=0: 0.00044068ms, 1%
@128 = gpu::code_object[code_object=8528,symbol_name=mlir_convolution_add_relu,global=6656,local=128,](@126,@119,@123,@127) -> float_type, {1, 256, 28, 28}, {200704, 784, 28, 1}, target_id=0: 0.0517969ms, 2%
@129 = multibroadcast[out_lens={1, 256, 14, 14},out_dyn_dims={}](@122) -> float_type, {1, 256, 14, 14}, {0, 1, 0, 0}, target_id=0: 0.00063236ms, 1%
@130 = load[offset=1605632,end=1806336](@1) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0: 0.00043468ms, 1%
@131 = gpu::code_object[code_object=8016,symbol_name=mlir_convolution_add_relu,global=7168,local=128,](@129,@128,@121,@130) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0: 0.0851257ms, 3%
@132 = step[axes={2, 3},steps={2, 2}](@119) -> float_type, {1, 512, 14, 14}, {401408, 784, 56, 2}, target_id=0: 0.0006886ms, 1%
@133 = load[offset=1003520,end=1605632](@1) -> float_type, {1, 768, 14, 14}, {150528, 196, 14, 1}, target_id=0: 0.00044972ms, 1%
@134 = gpu::code_object[code_object=4664,symbol_name=concat_kernel,global=75264,local=1024,](@131,@132,@133) -> float_type, {1, 768, 14, 14}, {150528, 196, 14, 1}, target_id=0: 0.0231407ms, 1%
@135 = load[offset=200704,end=1003520](@1) -> float_type, {1, 1024, 14, 14}, {200704, 196, 14, 1}, target_id=0: 0.00049978ms, 1%
@136 = multibroadcast[out_lens={1, 1024, 14, 14},out_dyn_dims={}](@124) -> float_type, {1, 1024, 14, 14}, {0, 1, 0, 0}, target_id=0: 0.0006352ms, 1%
@137 = gpu::code_object[code_object=8912,symbol_name=mlir_convolution_add_relu,global=7168,local=64,](@136,@134,@125,@135) -> float_type, {1, 1024, 14, 14}, {200704, 196, 14, 1}, target_id=0: 0.0675623ms, 2%
@138 = hip::hip_copy_literal[id=main:@literal:71] -> float_type, {256, 1, 1}, {1, 1, 1}, target_id=0: 0.00049694ms, 1%
@139 = hip::hip_copy_literal[id=main:@literal:4] -> float_type, {256, 256, 3, 3}, {2304, 9, 3, 1}, target_id=0: 0.00035526ms, 1%
@140 = hip::hip_copy_literal[id=main:@literal:20] -> float_type, {1024, 1, 1}, {1, 1, 1}, target_id=0: 0.00041688ms, 1%
@141 = hip::hip_copy_literal[id=main:@literal:19] -> float_type, {256, 1, 1}, {1, 1, 1}, target_id=0: 0.00037182ms, 1%
@142 = hip::hip_copy_literal[id=main:@literal:67] -> float_type, {1024, 1, 1}, {1, 1, 1}, target_id=0: 0.00038054ms, 1%
@143 = hip::hip_copy_literal[id=main:@literal:32] -> float_type, {256, 1, 1}, {1, 1, 1}, target_id=0: 0.00036428ms, 1%
@144 = hip::hip_copy_literal[id=main:@literal:22] -> float_type, {256, 1, 1}, {1, 1, 1}, target_id=0: 0.00036896ms, 1%
@145 = hip::hip_copy_literal[id=main:@literal:15] -> float_type, {256, 1024, 1, 1}, {1024, 1, 1, 1}, target_id=0: 0.00037792ms, 1%
@146 = hip::hip_copy_literal[id=main:@literal:46] -> float_type, {1024, 256, 1, 1}, {256, 1, 1, 1}, target_id=0: 0.00034898ms, 1%
@147 = hip::hip_copy_literal[id=main:@literal:68] -> float_type, {256, 1024, 1, 1}, {1024, 1, 1, 1}, target_id=0: 0.00036722ms, 1%
@148 = hip::hip_copy_literal[id=main:@literal:18] -> float_type, {256, 256, 3, 3}, {2304, 9, 3, 1}, target_id=0: 0.00037896ms, 1%
@149 = hip::hip_copy_literal[id=main:@literal:37] -> float_type, {1024, 256, 1, 1}, {256, 1, 1, 1}, target_id=0: 0.0003841ms, 1%
@150 = hip::hip_copy_literal[id=main:@literal:78] -> float_type, {256, 1024, 1, 1}, {1024, 1, 1, 1}, target_id=0: 0.00038246ms, 1%
@151 = hip::hip_copy_literal[id=main:@literal:30] -> float_type, {1024, 256, 1, 1}, {256, 1, 1, 1}, target_id=0: 0.00034434ms, 1%
@152 = hip::hip_copy_literal[id=main:@literal:63] -> float_type, {256, 1, 1}, {1, 1, 1}, target_id=0: 0.00040654ms, 1%
@153 = hip::hip_copy_literal[id=main:@literal:14] -> float_type, {1024, 1, 1}, {1, 1, 1}, target_id=0: 0.00036304ms, 1%
@154 = hip::hip_copy_literal[id=main:@literal:56] -> float_type, {256, 1, 1}, {1, 1, 1}, target_id=0: 0.00037158ms, 1%
@155 = hip::hip_copy_literal[id=main:@literal:42] -> float_type, {256, 256, 3, 3}, {2304, 9, 3, 1}, target_id=0: 0.00035384ms, 1%
@156 = load[offset=1204224,end=1404928](@1) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0: 0.0004572ms, 1%
@157 = multibroadcast[out_lens={1, 256, 14, 14},out_dyn_dims={}](@144) -> float_type, {1, 256, 14, 14}, {0, 1, 0, 0}, target_id=0: 0.00062034ms, 1%
@158 = gpu::code_object[code_object=8272,symbol_name=mlir_convolution_add_relu,global=3584,local=64,](@157,@137,@147,@156) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0: 0.0524338ms, 2%
@159 = multibroadcast[out_lens={1, 256, 14, 14},out_dyn_dims={}](@154) -> float_type, {1, 256, 14, 14}, {0, 1, 0, 0}, target_id=0: 0.00065436ms, 1%
@160 = load[offset=1003520,end=1204224](@1) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0: 0.00044104ms, 1%
@161 = gpu::code_object[code_object=7888,symbol_name=mlir_convolution_add_relu,global=7168,local=128,](@159,@158,@155,@160) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0: 0.0859443ms, 3%
@162 = multibroadcast[out_lens={1, 1024, 14, 14},out_dyn_dims={}](@140) -> float_type, {1, 1024, 14, 14}, {0, 1, 0, 0}, target_id=0: 0.00062986ms, 1%
@163 = load[offset=2207744,end=3010560](@1) -> float_type, {1, 1024, 14, 14}, {200704, 196, 14, 1}, target_id=0: 0.00045138ms, 1%
@164 = gpu::code_object[code_object=9176,symbol_name=mlir_convolution_add_add_relu,global=7168,local=64,](@162,@137,@161,@149,@163) -> float_type, {1, 1024, 14, 14}, {200704, 196, 14, 1}, target_id=0: 0.0411003ms, 2%
@165 = load[offset=200704,end=401408](@1) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0: 0.00049776ms, 1%
@166 = multibroadcast[out_lens={1, 256, 14, 14},out_dyn_dims={}](@143) -> float_type, {1, 256, 14, 14}, {0, 1, 0, 0}, target_id=0: 0.00054584ms, 1%
@167 = gpu::code_object[code_object=8272,symbol_name=mlir_convolution_add_relu,global=3584,local=64,](@166,@164,@150,@165) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0: 0.0523705ms, 2%
@168 = multibroadcast[out_lens={1, 256, 14, 14},out_dyn_dims={}](@152) -> float_type, {1, 256, 14, 14}, {0, 1, 0, 0}, target_id=0: 0.00061968ms, 1%
@169 = load[offset=0,end=200704](@1) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0: 0.00043678ms, 1%
@170 = gpu::code_object[code_object=7888,symbol_name=mlir_convolution_add_relu,global=7168,local=128,](@168,@167,@148,@169) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0: 0.0857781ms, 3%
@171 = multibroadcast[out_lens={1, 1024, 14, 14},out_dyn_dims={}](@142) -> float_type, {1, 1024, 14, 14}, {0, 1, 0, 0}, target_id=0: 0.00062484ms, 1%
@172 = load[offset=1404928,end=2207744](@1) -> float_type, {1, 1024, 14, 14}, {200704, 196, 14, 1}, target_id=0: 0.00044598ms, 1%
@173 = gpu::code_object[code_object=9176,symbol_name=mlir_convolution_add_add_relu,global=7168,local=64,](@171,@164,@170,@151,@172) -> float_type, {1, 1024, 14, 14}, {200704, 196, 14, 1}, target_id=0: 0.040947ms, 2%
@174 = load[offset=401408,end=602112](@1) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0: 0.00048892ms, 1%
@175 = multibroadcast[out_lens={1, 256, 14, 14},out_dyn_dims={}](@141) -> float_type, {1, 256, 14, 14}, {0, 1, 0, 0}, target_id=0: 0.00054632ms, 1%
@176 = gpu::code_object[code_object=8272,symbol_name=mlir_convolution_add_relu,global=3584,local=64,](@175,@173,@145,@174) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0: 0.0531429ms, 2%
@177 = multibroadcast[out_lens={1, 256, 14, 14},out_dyn_dims={}](@138) -> float_type, {1, 256, 14, 14}, {0, 1, 0, 0}, target_id=0: 0.0006173ms, 1%
@178 = load[offset=200704,end=401408](@1) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0: 0.00043138ms, 1%
@179 = gpu::code_object[code_object=7888,symbol_name=mlir_convolution_add_relu,global=7168,local=128,](@177,@176,@139,@178) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0: 0.0857418ms, 3%
@180 = multibroadcast[out_lens={1, 1024, 14, 14},out_dyn_dims={}](@153) -> float_type, {1, 1024, 14, 14}, {0, 1, 0, 0}, target_id=0: 0.00061174ms, 1%
@181 = load[offset=602112,end=1404928](@1) -> float_type, {1, 1024, 14, 14}, {200704, 196, 14, 1}, target_id=0: 0.00044204ms, 1%
@182 = gpu::code_object[code_object=9176,symbol_name=mlir_convolution_add_add_relu,global=7168,local=64,](@180,@173,@179,@146,@181) -> float_type, {1, 1024, 14, 14}, {200704, 196, 14, 1}, target_id=0: 0.040946ms, 2%
@183 = hip::hip_copy_literal[id=main:@literal:10] -> float_type, {1024, 1, 1}, {1, 1, 1}, target_id=0: 0.00053ms, 1%
@184 = hip::hip_copy_literal[id=main:@literal:76] -> float_type, {256, 1, 1}, {1, 1, 1}, target_id=0: 0.00036128ms, 1%
@185 = hip::hip_copy_literal[id=main:@literal:87] -> float_type, {256, 1024, 1, 1}, {1024, 1, 1, 1}, target_id=0: 0.0003716ms, 1%
@186 = hip::hip_copy_literal[id=main:@literal:13] -> float_type, {256, 1, 1}, {1, 1, 1}, target_id=0: 0.00039456ms, 1%
@187 = hip::hip_copy_literal[id=main:@literal:12] -> float_type, {1024, 256, 1, 1}, {256, 1, 1, 1}, target_id=0: 0.000373ms, 1%
@188 = hip::hip_copy_literal[id=main:@literal:21] -> float_type, {256, 256, 3, 3}, {2304, 9, 3, 1}, target_id=0: 0.00038272ms, 1%
@189 = load[offset=401408,end=602112](@1) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0: 0.00046172ms, 1%
@190 = multibroadcast[out_lens={1, 256, 14, 14},out_dyn_dims={}](@186) -> float_type, {1, 256, 14, 14}, {0, 1, 0, 0}, target_id=0: 0.00059918ms, 1%
@191 = gpu::code_object[code_object=8272,symbol_name=mlir_convolution_add_relu,global=3584,local=64,](@190,@182,@185,@189) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0: 0.0522866ms, 2%
@192 = multibroadcast[out_lens={1, 256, 14, 14},out_dyn_dims={}](@184) -> float_type, {1, 256, 14, 14}, {0, 1, 0, 0}, target_id=0: 0.00060792ms, 1%
@193 = load[offset=200704,end=401408](@1) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0: 0.00043876ms, 1%
@194 = gpu::code_object[code_object=7888,symbol_name=mlir_convolution_add_relu,global=7168,local=128,](@192,@191,@188,@193) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0: 0.0857457ms, 3%
@195 = multibroadcast[out_lens={1, 1024, 14, 14},out_dyn_dims={}](@183) -> float_type, {1, 1024, 14, 14}, {0, 1, 0, 0}, target_id=0: 0.00062518ms, 1%
@196 = load[offset=1806336,end=2609152](@1) -> float_type, {1, 1024, 14, 14}, {200704, 196, 14, 1}, target_id=0: 0.00043404ms, 1%
@197 = gpu::code_object[code_object=9176,symbol_name=mlir_convolution_add_add_relu,global=7168,local=64,](@195,@182,@194,@187,@196) -> float_type, {1, 1024, 14, 14}, {200704, 196, 14, 1}, target_id=0: 0.0410143ms, 2%
@198 = hip::hip_copy_literal[id=main:@literal:9] -> float_type, {256, 1024, 1, 1}, {1024, 1, 1, 1}, target_id=0: 0.00045082ms, 1%
@199 = hip::hip_copy_literal[id=main:@literal:48] -> float_type, {2048, 1536, 1, 1}, {1536, 1, 1, 1}, target_id=0: 0.0003759ms, 1%
@200 = hip::hip_copy_literal[id=main:@literal:2] -> float_type, {512, 1, 1}, {1, 1, 1}, target_id=0: 0.00039156ms, 1%
@201 = hip::hip_copy_literal[id=main:@literal:90] -> float_type, {512, 1, 1}, {1, 1, 1}, target_id=0: 0.00048066ms, 1%
@202 = hip::hip_copy_literal[id=main:@literal:3] -> float_type, {1024, 1, 1}, {1, 1, 1}, target_id=0: 0.0003454ms, 1%
@203 = hip::hip_copy_literal[id=main:@literal:91] -> float_type, {2048, 1, 1}, {1, 1, 1}, target_id=0: 0.000399ms, 1%
@204 = hip::hip_copy_literal[id=main:@literal:5] -> float_type, {256, 256, 3, 3}, {2304, 9, 3, 1}, target_id=0: 0.0003598ms, 1%
@205 = hip::hip_copy_literal[id=main:@literal:29] -> float_type, {256, 1, 1}, {1, 1, 1}, target_id=0: 0.00040438ms, 1%
@206 = hip::hip_copy_literal[id=main:@literal:55] -> float_type, {512, 1024, 1, 1}, {1024, 1, 1, 1}, target_id=0: 0.00040004ms, 1%
@207 = hip::hip_copy_literal[id=main:@literal:34] -> float_type, {1024, 256, 1, 1}, {256, 1, 1, 1}, target_id=0: 0.00038314ms, 1%
@208 = hip::hip_copy_literal[id=main:@literal:8] -> float_type, {256, 1, 1}, {1, 1, 1}, target_id=0: 0.00033954ms, 1%
@209 = hip::hip_copy_literal[id=main:@literal:73] -> float_type, {512, 512, 3, 3}, {4608, 9, 3, 1}, target_id=0: 0.00039236ms, 1%
@210 = load[offset=200704,end=401408](@1) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0: 0.00046064ms, 1%
@211 = multibroadcast[out_lens={1, 256, 14, 14},out_dyn_dims={}](@208) -> float_type, {1, 256, 14, 14}, {0, 1, 0, 0}, target_id=0: 0.0005941ms, 1%
@212 = gpu::code_object[code_object=8272,symbol_name=mlir_convolution_add_relu,global=3584,local=64,](@211,@197,@198,@210) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0: 0.0528269ms, 2%
@213 = multibroadcast[out_lens={1, 256, 14, 14},out_dyn_dims={}](@205) -> float_type, {1, 256, 14, 14}, {0, 1, 0, 0}, target_id=0: 0.00063294ms, 1%
@214 = load[offset=0,end=200704](@1) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0: 0.00044464ms, 1%
@215 = gpu::code_object[code_object=7888,symbol_name=mlir_convolution_add_relu,global=7168,local=128,](@213,@212,@204,@214) -> float_type, {1, 256, 14, 14}, {50176, 196, 14, 1}, target_id=0: 0.0857935ms, 3%
@216 = multibroadcast[out_lens={1, 1024, 14, 14},out_dyn_dims={}](@202) -> float_type, {1, 1024, 14, 14}, {0, 1, 0, 0}, target_id=0: 0.00063142ms, 1%
@217 = load[offset=1003520,end=1806336](@1) -> float_type, {1, 1024, 14, 14}, {200704, 196, 14, 1}, target_id=0: 0.00045054ms, 1%
@218 = gpu::code_object[code_object=9176,symbol_name=mlir_convolution_add_add_relu,global=7168,local=64,](@216,@197,@215,@207,@217) -> float_type, {1, 1024, 14, 14}, {200704, 196, 14, 1}, target_id=0: 0.0408826ms, 2%
@219 = load[offset=501760,end=903168](@1) -> float_type, {1, 512, 14, 14}, {100352, 196, 14, 1}, target_id=0: 0.00049836ms, 1%
@220 = multibroadcast[out_lens={1, 512, 14, 14},out_dyn_dims={}](@200) -> float_type, {1, 512, 14, 14}, {0, 1, 0, 0}, target_id=0: 0.00054322ms, 1%
@221 = gpu::code_object[code_object=8784,symbol_name=mlir_convolution_add_relu,global=3584,local=64,](@220,@218,@206,@219) -> float_type, {1, 512, 14, 14}, {100352, 196, 14, 1}, target_id=0: 0.0956597ms, 3%
@222 = multibroadcast[out_lens={1, 512, 7, 7},out_dyn_dims={}](@201) -> float_type, {1, 512, 7, 7}, {0, 1, 0, 0}, target_id=0: 0.0006375ms, 1%
@223 = load[offset=401408,end=501760](@1) -> float_type, {1, 512, 7, 7}, {25088, 49, 7, 1}, target_id=0: 0.00044206ms, 1%
@224 = gpu::code_object[code_object=8400,symbol_name=mlir_convolution_add_relu,global=4096,local=128,](@222,@221,@209,@223) -> float_type, {1, 512, 7, 7}, {25088, 49, 7, 1}, target_id=0: 0.151329ms, 5%
@225 = step[axes={2, 3},steps={2, 2}](@218) -> float_type, {1, 1024, 7, 7}, {200704, 196, 28, 2}, target_id=0: 0.00070994ms, 1%
@226 = load[offset=0,end=301056](@1) -> float_type, {1, 1536, 7, 7}, {75264, 49, 7, 1}, target_id=0: 0.00044948ms, 1%
@227 = gpu::code_object[code_object=4792,symbol_name=concat_kernel,global=37632,local=1024,](@224,@225,@226) -> float_type, {1, 1536, 7, 7}, {75264, 49, 7, 1}, target_id=0: 0.0230122ms, 1%
@228 = multibroadcast[out_lens={1, 2048, 7, 7},out_dyn_dims={}](@203) -> float_type, {1, 2048, 7, 7}, {0, 1, 0, 0}, target_id=0: 0.00070184ms, 1%
@229 = load[offset=903168,end=1304576](@1) -> float_type, {1, 2048, 7, 7}, {100352, 49, 7, 1}, target_id=0: 0.00045126ms, 1%
@230 = gpu::code_object[code_object=14160,symbol_name=mlir_convolution_add_relu,global=2048,local=64,](@228,@227,@199,@229) -> float_type, {1, 2048, 7, 7}, {100352, 49, 7, 1}, target_id=0: 0.230157ms, 7%
@231 = hip::hip_copy_literal[id=main:@literal:66] -> float_type, {2048, 512, 1, 1}, {512, 1, 1, 1}, target_id=0: 0.0005779ms, 1%
@232 = hip::hip_copy_literal[id=main:@literal:99] -> float_type, {2048, 1, 1}, {1, 1, 1}, target_id=0: 0.0003657ms, 1%
@233 = hip::hip_copy_literal[id=main:@literal:98] -> float_type, {512, 1, 1}, {1, 1, 1}, target_id=0: 0.00036472ms, 1%
@234 = hip::hip_copy_literal[id=main:@literal:93] -> float_type, {512, 1, 1}, {1, 1, 1}, target_id=0: 0.0003823ms, 1%
@235 = hip::hip_copy_literal[id=main:@literal:92] -> float_type, {512, 2048, 1, 1}, {2048, 1, 1, 1}, target_id=0: 0.0003475ms, 1%
@236 = hip::hip_copy_literal[id=main:@literal:94] -> float_type, {512, 512, 3, 3}, {4608, 9, 3, 1}, target_id=0: 0.00035194ms, 1%
@237 = hip::hip_copy_literal[id=main:@literal:85] -> float_type, {512, 2048, 1, 1}, {2048, 1, 1, 1}, target_id=0: 0.00037004ms, 1%
@238 = hip::hip_copy_literal[id=main:@literal:44] -> float_type, {512, 1, 1}, {1, 1, 1}, target_id=0: 0.00034948ms, 1%
@239 = hip::hip_copy_literal[id=main:@literal:41] -> float_type, {2048, 512, 1, 1}, {512, 1, 1, 1}, target_id=0: 0.00032656ms, 1%
@240 = hip::hip_copy_literal[id=main:@literal:38] -> float_type, {1000}, {1}, target_id=0: 0.00036496ms, 1%
@241 = hip::hip_copy_literal[id=main:@literal:0] -> float_type, {2048, 1000}, {1000, 1}, target_id=0: 0.00034242ms, 1%
@242 = hip::hip_copy_literal[id=main:@literal:97] -> float_type, {512, 512, 3, 3}, {4608, 9, 3, 1}, target_id=0: 0.00037212ms, 1%
@243 = hip::hip_copy_literal[id=main:@literal:96] -> float_type, {512, 1, 1}, {1, 1, 1}, target_id=0: 0.0003608ms, 1%
@244 = hip::hip_copy_literal[id=main:@literal:95] -> float_type, {2048, 1, 1}, {1, 1, 1}, target_id=0: 0.00033058ms, 1%
@245 = multibroadcast[out_lens={1, 512, 7, 7},out_dyn_dims={}](@234) -> float_type, {1, 512, 7, 7}, {0, 1, 0, 0}, target_id=0: 0.00064924ms, 1%
@246 = load[offset=100352,end=200704](@1) -> float_type, {1, 512, 7, 7}, {25088, 49, 7, 1}, target_id=0: 0.00045572ms, 1%
@247 = gpu::code_object[code_object=9040,symbol_name=mlir_convolution_add_relu,global=2048,local=64,](@245,@230,@235,@246) -> float_type, {1, 512, 7, 7}, {25088, 49, 7, 1}, target_id=0: 0.0993969ms, 3%
@248 = multibroadcast[out_lens={1, 512, 7, 7},out_dyn_dims={}](@238) -> float_type, {1, 512, 7, 7}, {0, 1, 0, 0}, target_id=0: 0.00074974ms, 1%
@249 = load[offset=0,end=100352](@1) -> float_type, {1, 512, 7, 7}, {25088, 49, 7, 1}, target_id=0: 0.00044806ms, 1%
@250 = gpu::code_object[code_object=8272,symbol_name=mlir_convolution_add_relu,global=4096,local=128,](@248,@247,@236,@249) -> float_type, {1, 512, 7, 7}, {25088, 49, 7, 1}, target_id=0: 0.152284ms, 5%
@251 = multibroadcast[out_lens={1, 2048, 7, 7},out_dyn_dims={}](@244) -> float_type, {1, 2048, 7, 7}, {0, 1, 0, 0}, target_id=0: 0.0006781ms, 1%
@252 = load[offset=501760,end=903168](@1) -> float_type, {1, 2048, 7, 7}, {100352, 49, 7, 1}, target_id=0: 0.00043828ms, 1%
@253 = gpu::code_object[code_object=9944,symbol_name=mlir_convolution_add_add_relu,global=4096,local=64,](@251,@230,@250,@239,@252) -> float_type, {1, 2048, 7, 7}, {100352, 49, 7, 1}, target_id=0: 0.0853333ms, 3%
@254 = load[offset=100352,end=200704](@1) -> float_type, {1, 512, 7, 7}, {25088, 49, 7, 1}, target_id=0: 0.00049116ms, 1%
@255 = multibroadcast[out_lens={1, 512, 7, 7},out_dyn_dims={}](@243) -> float_type, {1, 512, 7, 7}, {0, 1, 0, 0}, target_id=0: 0.00054102ms, 1%
@256 = gpu::code_object[code_object=9040,symbol_name=mlir_convolution_add_relu,global=2048,local=64,](@255,@253,@237,@254) -> float_type, {1, 512, 7, 7}, {25088, 49, 7, 1}, target_id=0: 0.100092ms, 3%
@257 = multibroadcast[out_lens={1, 512, 7, 7},out_dyn_dims={}](@233) -> float_type, {1, 512, 7, 7}, {0, 1, 0, 0}, target_id=0: 0.00062862ms, 1%
@258 = load[offset=0,end=100352](@1) -> float_type, {1, 512, 7, 7}, {25088, 49, 7, 1}, target_id=0: 0.00044728ms, 1%
@259 = gpu::code_object[code_object=8272,symbol_name=mlir_convolution_add_relu,global=4096,local=128,](@257,@256,@242,@258) -> float_type, {1, 512, 7, 7}, {25088, 49, 7, 1}, target_id=0: 0.152366ms, 5%
@260 = load[offset=903168,end=1304576](@1) -> float_type, {1, 2048, 7, 7}, {100352, 49, 7, 1}, target_id=0: 0.00049212ms, 1%
@261 = gpu::code_object[code_object=8120,symbol_name=mlir_convolution,global=4096,local=64,](@259,@231,@260) -> float_type, {1, 2048, 7, 7}, {100352, 49, 7, 1}, target_id=0: 0.0839427ms, 3%
@262 = multibroadcast[out_lens={1, 2048, 7, 7},out_dyn_dims={}](@232) -> float_type, {1, 2048, 7, 7}, {0, 1, 0, 0}, target_id=0: 0.0007467ms, 1%
@263 = load[offset=0,end=8192](@1) -> float_type, {1, 2048, 1, 1}, {2048, 1, 1, 1}, target_id=0: 0.00043584ms, 1%
@264 = gpu::code_object[code_object=4568,symbol_name=add_add_relu_reduce_mean_kernel,global=131072,local=64,](@261,@262,@253,@263) -> float_type, {1, 2048, 1, 1}, {2048, 1, 1, 1}, target_id=0: 0.0232591ms, 1%
@265 = multibroadcast[out_lens={1, 1000},out_dyn_dims={}](@240) -> float_type, {1, 1000}, {0, 1}, target_id=0: 0.0009175ms, 1%
main:#output_0 = @param:main:#output_0 -> float_type, {1, 1000}, {1000, 1}, target_id=0: 0.00056152ms, 1%
@267 = gpu::code_object[code_object=5440,symbol_name=mlir_reshape_dot_add,global=2048,local=64,](@265,@264,@241,main:#output_0) -> float_type, {1, 1000}, {1000, 1}, target_id=0: 0.0641472ms, 2%
@268 = @return(@267), target_id=0
Summary:
gpu::code_object::mlir_convolution_add_relu: 2.76366ms / 37 = 0.0746935ms, 76%
gpu::code_object::mlir_convolution_add_add_relu: 0.479536ms / 11 = 0.0435942ms, 14%
gpu::code_object::concat_kernel: 0.0951083ms / 4 = 0.0237771ms, 3%
gpu::code_object::mlir_convolution: 0.0839427ms / 1 = 0.0839427ms, 3%
gpu::code_object::mlir_reshape_dot_add: 0.0641472ms / 1 = 0.0641472ms, 2%
gpu::pooling: 0.0425269ms / 1 = 0.0425269ms, 2%
hip::hip_copy_literal: 0.0404851ms / 100 = 0.000404851ms, 2%
multibroadcast: 0.0322115ms / 50 = 0.00064423ms, 1%
load: 0.0249479ms / 55 = 0.000453598ms, 1%
gpu::code_object::add_add_relu_reduce_mean_kernel: 0.0232591ms / 1 = 0.0232591ms, 1%
step: 0.00225104ms / 3 = 0.000750347ms, 1%
@param: 0.00087068ms / 2 = 0.00043534ms, 1%
check_context::migraphx::gpu::context: 0.00065132ms / 1 = 0.00065132ms, 1%
hip::hip_allocate_memory: 0.00051378ms / 1 = 0.00051378ms, 1%

Batch size: 1
Rate: 396.039 inferences/sec
Total time: 2.525ms
Total instructions time: 3.65411ms
Overhead time: 0.0256813ms, -1.12911ms
Overhead: 1%, -45%
[ MIGraphX Version: 2.9.0.318737422 ] Complete: migraphx-driver perf --model resnet50
Crizle commented 3 weeks ago

test_onnxruntime_providers.py

cd /opt/rocm_sdk_611/docs/examples/onnxruntime
chris@fedora:/opt/rocm_sdk_611/docs/examples/onnxruntime$ test_onnxruntime_providers.py*
bash: test_onnxruntime_providers.py: command not found...
chris@fedora:/opt/rocm_sdk_611/docs/examples/onnxruntime$ ls
test_onnxruntime_providers.py
**chris@fedora:/opt/rocm_sdk_611/docs/examples/onnxruntime$ python test_onnxruntime_providers.py 
['MIGraphXExecutionProvider', 'ROCMExecutionProvider', 'CPUExecutionProvider']**
Crizle commented 3 weeks ago

Test HIPCC compiler

rm -f ./hello_world
rm -f hello_world.o
rm -f /opt/rocm_sdk_611/src/*.o
/opt/rocm_sdk_611/bin/hipcc -g -fPIE   -c -o hello_world.o hello_world.cpp
hello_world.cpp:48:2: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   48 |         hipGetDeviceProperties(&devProp, 0);
      |         ^~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~
/opt/rocm_sdk_611/include/hip/hip_runtime_api.h:91:32: note: expanded from macro 'hipGetDeviceProperties'
   91 | #define hipGetDeviceProperties hipGetDevicePropertiesR0600
      |                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~
hello_world.cpp:62:2: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   62 |         hipMalloc((void**)&inputBuffer,
      |         ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~
   63 |                 (strlength + 1) * sizeof(char));
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hello_world.cpp:64:2: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   64 |         hipMalloc((void**)&outputBuffer,
      |         ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~
   65 |                 (strlength + 1) * sizeof(char));
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hello_world.cpp:66:2: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   66 |         hipMemcpy(inputBuffer,
      |         ^~~~~~~~~ ~~~~~~~~~~~~
   67 |                 input,
      |                 ~~~~~~
   68 |                 (strlength + 1) * sizeof(char),
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   69 |                 hipMemcpyHostToDevice);
      |                 ~~~~~~~~~~~~~~~~~~~~~
hello_world.cpp:77:2: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   77 |         hipMemcpy(output,
      |         ^~~~~~~~~ ~~~~~~~
   78 |                 outputBuffer,
      |                 ~~~~~~~~~~~~~
   79 |                 (strlength + 1) * sizeof(char),
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   80 |                 hipMemcpyDeviceToHost);
      |                 ~~~~~~~~~~~~~~~~~~~~~
hello_world.cpp:81:2: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   81 |         hipFree(inputBuffer);
      |         ^~~~~~~ ~~~~~~~~~~~
hello_world.cpp:82:2: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   82 |         hipFree(outputBuffer);
      |         ^~~~~~~ ~~~~~~~~~~~~
7 warnings generated when compiling for gfx1101.
hello_world.cpp:48:2: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   48 |         hipGetDeviceProperties(&devProp, 0);
      |         ^~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~
/opt/rocm_sdk_611/include/hip/hip_runtime_api.h:91:32: note: expanded from macro 'hipGetDeviceProperties'
   91 | #define hipGetDeviceProperties hipGetDevicePropertiesR0600
      |                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~
hello_world.cpp:62:2: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   62 |         hipMalloc((void**)&inputBuffer,
      |         ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~
   63 |                 (strlength + 1) * sizeof(char));
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hello_world.cpp:64:2: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   64 |         hipMalloc((void**)&outputBuffer,
      |         ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~
   65 |                 (strlength + 1) * sizeof(char));
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hello_world.cpp:66:2: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   66 |         hipMemcpy(inputBuffer,
      |         ^~~~~~~~~ ~~~~~~~~~~~~
   67 |                 input,
      |                 ~~~~~~
   68 |                 (strlength + 1) * sizeof(char),
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   69 |                 hipMemcpyHostToDevice);
      |                 ~~~~~~~~~~~~~~~~~~~~~
hello_world.cpp:77:2: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   77 |         hipMemcpy(output,
      |         ^~~~~~~~~ ~~~~~~~
   78 |                 outputBuffer,
      |                 ~~~~~~~~~~~~~
   79 |                 (strlength + 1) * sizeof(char),
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   80 |                 hipMemcpyDeviceToHost);
      |                 ~~~~~~~~~~~~~~~~~~~~~
hello_world.cpp:81:2: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   81 |         hipFree(inputBuffer);
      |         ^~~~~~~ ~~~~~~~~~~~
hello_world.cpp:82:2: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   82 |         hipFree(outputBuffer);
      |         ^~~~~~~ ~~~~~~~~~~~~
7 warnings generated when compiling for host.
/opt/rocm_sdk_611/bin/hipcc hello_world.o -fPIE -o hello_world
./hello_world
 System minor: 0
 System major: 11
 Agent name: AMD Radeon RX 7800 XT
Input string: GdkknVnqkc
Output string: HelloWorld
Test ok!
Crizle commented 3 weeks ago

OpenCL Integration(I assume this isn't working as it hasn't built due to the DeepSeed build failing, which I haven't looked into more to try and resolve yet):

cd docs/examples/opencl/check_opencl_caps
bash: cd: docs/examples/opencl/check_opencl_caps: No such file or directory
Crizle commented 3 weeks ago

./run_torchvision_gpu_benchmarks.sh

start, count:  4
hip_fatbin.cpp: COMGR API could not find the CO for this GPU device/ISA: amdgcn-amd-amdhsa--gfx1101
hip_fatbin.cpp: COMGR API could not find the CO for this GPU device/ISA: amdgcn-amd-amdhsa--gfx1101
benchmark start : 2024/06/04 12:23:17
Number of GPUs on current device : 1
CUDA Version : None
Cudnn Version : 3001000
Device Name : AMD Radeon RX 7800 XT
uname_result(system='Linux', node='fedora', release='6.8.11-300.fc40.x86_64', version='#1 SMP PREEMPT_DYNAMIC Mon May 27 14:53:33 UTC 2024', machine='x86_64')
                     scpufreq(current=1267.4599166666667, min=400.0, max=4464.0)
                    cpu_count: 12
                    memory_available: 25994178560
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:135: UserWarning: Using 'weights' as positional parameter(s) is deprecated since 0.13 and may be removed in the future. Please use keyword parameter(s) instead.
  warnings.warn(
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=MNASNet0_5_Weights.IMAGENET1K_V1`. You can also use `weights=MNASNet0_5_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/mnasnet0.5_top1_67.823-3ffadce67e.pth" to /home/chris/.cache/torch/hub/checkpoints/mnasnet0.5_top1_67.823-3ffadce67e.pth
100%|███████████████████████████████████████████████████████████████████████████████████| 8.59M/8.59M [00:00<00:00, 12.9MB/s]
Traceback (most recent call last):
  File "/home/chris/pytorch-gpu-benchmark/benchmark_models_torchvision_013.py", line 260, in <module>
    train_result = train(precision)
  File "/home/chris/pytorch-gpu-benchmark/benchmark_models_torchvision_013.py", line 152, in train
    model = nn.DataParallel(model, device_ids=range(args.NUM_GPU))
  File "/opt/rocm_sdk_611/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 159, in __init__
    _check_balance(self.device_ids)
  File "/opt/rocm_sdk_611/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 26, in _check_balance
    dev_props = _get_devices_properties(device_ids)
  File "/opt/rocm_sdk_611/lib/python3.9/site-packages/torch/_utils.py", line 745, in _get_devices_properties
    return [_get_device_attr(lambda m: m.get_device_properties(i)) for i in device_ids]
  File "/opt/rocm_sdk_611/lib/python3.9/site-packages/torch/_utils.py", line 745, in <listcomp>
    return [_get_device_attr(lambda m: m.get_device_properties(i)) for i in device_ids]
  File "/opt/rocm_sdk_611/lib/python3.9/site-packages/torch/_utils.py", line 724, in _get_device_attr
    return get_member(torch.cuda)
  File "/opt/rocm_sdk_611/lib/python3.9/site-packages/torch/_utils.py", line 745, in <lambda>
    return [_get_device_attr(lambda m: m.get_device_properties(i)) for i in device_ids]
  File "/opt/rocm_sdk_611/lib/python3.9/site-packages/torch/cuda/__init__.py", line 447, in get_device_properties
    raise AssertionError("Invalid device id")
AssertionError: Invalid device id
hip_fatbin.cpp: COMGR API could not find the CO for this GPU device/ISA: amdgcn-amd-amdhsa--gfx1101
hip_fatbin.cpp: COMGR API could not find the CO for this GPU device/ISA: amdgcn-amd-amdhsa--gfx1101
benchmark start : 2024/06/04 12:23:23
Number of GPUs on current device : 1
CUDA Version : None
Cudnn Version : 3001000
Device Name : AMD Radeon RX 7800 XT
uname_result(system='Linux', node='fedora', release='6.8.11-300.fc40.x86_64', version='#1 SMP PREEMPT_DYNAMIC Mon May 27 14:53:33 UTC 2024', machine='x86_64')
                     scpufreq(current=1305.3748333333333, min=400.0, max=4464.0)
                    cpu_count: 12
                    memory_available: 26437152768
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:135: UserWarning: Using 'weights' as positional parameter(s) is deprecated since 0.13 and may be removed in the future. Please use keyword parameter(s) instead.
  warnings.warn(
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=MNASNet0_5_Weights.IMAGENET1K_V1`. You can also use `weights=MNASNet0_5_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Traceback (most recent call last):
  File "/home/chris/pytorch-gpu-benchmark/benchmark_models_torchvision_013.py", line 260, in <module>
    train_result = train(precision)
  File "/home/chris/pytorch-gpu-benchmark/benchmark_models_torchvision_013.py", line 152, in train
    model = nn.DataParallel(model, device_ids=range(args.NUM_GPU))
  File "/opt/rocm_sdk_611/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 159, in __init__
    _check_balance(self.device_ids)
  File "/opt/rocm_sdk_611/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 26, in _check_balance
    dev_props = _get_devices_properties(device_ids)
  File "/opt/rocm_sdk_611/lib/python3.9/site-packages/torch/_utils.py", line 745, in _get_devices_properties
    return [_get_device_attr(lambda m: m.get_device_properties(i)) for i in device_ids]
  File "/opt/rocm_sdk_611/lib/python3.9/site-packages/torch/_utils.py", line 745, in <listcomp>
    return [_get_device_attr(lambda m: m.get_device_properties(i)) for i in device_ids]
  File "/opt/rocm_sdk_611/lib/python3.9/site-packages/torch/_utils.py", line 724, in _get_device_attr
    return get_member(torch.cuda)
  File "/opt/rocm_sdk_611/lib/python3.9/site-packages/torch/_utils.py", line 745, in <lambda>
    return [_get_device_attr(lambda m: m.get_device_properties(i)) for i in device_ids]
  File "/opt/rocm_sdk_611/lib/python3.9/site-packages/torch/cuda/__init__.py", line 447, in get_device_properties
    raise AssertionError("Invalid device id")
AssertionError: Invalid device id
hip_fatbin.cpp: COMGR API could not find the CO for this GPU device/ISA: amdgcn-amd-amdhsa--gfx1101
hip_fatbin.cpp: COMGR API could not find the CO for this GPU device/ISA: amdgcn-amd-amdhsa--gfx1101
benchmark start : 2024/06/04 12:23:28
Number of GPUs on current device : 1
CUDA Version : None
Cudnn Version : 3001000
Device Name : AMD Radeon RX 7800 XT
uname_result(system='Linux', node='fedora', release='6.8.11-300.fc40.x86_64', version='#1 SMP PREEMPT_DYNAMIC Mon May 27 14:53:33 UTC 2024', machine='x86_64')
                     scpufreq(current=2087.3185, min=400.0, max=4464.0)
                    cpu_count: 12
                    memory_available: 26831294464
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:135: UserWarning: Using 'weights' as positional parameter(s) is deprecated since 0.13 and may be removed in the future. Please use keyword parameter(s) instead.
  warnings.warn(
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=MNASNet0_5_Weights.IMAGENET1K_V1`. You can also use `weights=MNASNet0_5_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Traceback (most recent call last):
  File "/home/chris/pytorch-gpu-benchmark/benchmark_models_torchvision_013.py", line 260, in <module>
    train_result = train(precision)
  File "/home/chris/pytorch-gpu-benchmark/benchmark_models_torchvision_013.py", line 152, in train
    model = nn.DataParallel(model, device_ids=range(args.NUM_GPU))
  File "/opt/rocm_sdk_611/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 159, in __init__
    _check_balance(self.device_ids)
  File "/opt/rocm_sdk_611/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 26, in _check_balance
    dev_props = _get_devices_properties(device_ids)
  File "/opt/rocm_sdk_611/lib/python3.9/site-packages/torch/_utils.py", line 745, in _get_devices_properties
    return [_get_device_attr(lambda m: m.get_device_properties(i)) for i in device_ids]
  File "/opt/rocm_sdk_611/lib/python3.9/site-packages/torch/_utils.py", line 745, in <listcomp>
    return [_get_device_attr(lambda m: m.get_device_properties(i)) for i in device_ids]
  File "/opt/rocm_sdk_611/lib/python3.9/site-packages/torch/_utils.py", line 724, in _get_device_attr
    return get_member(torch.cuda)
  File "/opt/rocm_sdk_611/lib/python3.9/site-packages/torch/_utils.py", line 745, in <lambda>
    return [_get_device_attr(lambda m: m.get_device_properties(i)) for i in device_ids]
  File "/opt/rocm_sdk_611/lib/python3.9/site-packages/torch/cuda/__init__.py", line 447, in get_device_properties
    raise AssertionError("Invalid device id")
AssertionError: Invalid device id
hip_fatbin.cpp: COMGR API could not find the CO for this GPU device/ISA: amdgcn-amd-amdhsa--gfx1101
hip_fatbin.cpp: COMGR API could not find the CO for this GPU device/ISA: amdgcn-amd-amdhsa--gfx1101
benchmark start : 2024/06/04 12:23:32
Number of GPUs on current device : 1
CUDA Version : None
Cudnn Version : 3001000
Device Name : AMD Radeon RX 7800 XT
uname_result(system='Linux', node='fedora', release='6.8.11-300.fc40.x86_64', version='#1 SMP PREEMPT_DYNAMIC Mon May 27 14:53:33 UTC 2024', machine='x86_64')
                     scpufreq(current=2049.122833333333, min=400.0, max=4464.0)
                    cpu_count: 12
                    memory_available: 27213508608
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:135: UserWarning: Using 'weights' as positional parameter(s) is deprecated since 0.13 and may be removed in the future. Please use keyword parameter(s) instead.
  warnings.warn(
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=MNASNet0_5_Weights.IMAGENET1K_V1`. You can also use `weights=MNASNet0_5_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Benchmarking Training float precision type mnasnet0_5 
mnasnet0_5 model average train time: 45.41799545288086 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=MNASNet0_75_Weights.IMAGENET1K_V1`. You can also use `weights=MNASNet0_75_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/mnasnet0_75-7090bc5f.pth" to /home/chris/.cache/torch/hub/checkpoints/mnasnet0_75-7090bc5f.pth
100%|███████████████████████████████████████████████████████████████████████████████████| 12.3M/12.3M [00:04<00:00, 2.78MB/s]
Benchmarking Training float precision type mnasnet0_75 
mnasnet0_75 model average train time: 54.30558204650879 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=MNASNet1_0_Weights.IMAGENET1K_V1`. You can also use `weights=MNASNet1_0_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/mnasnet1.0_top1_73.512-f206786ef8.pth" to /home/chris/.cache/torch/hub/checkpoints/mnasnet1.0_top1_73.512-f206786ef8.pth
100%|███████████████████████████████████████████████████████████████████████████████████| 16.9M/16.9M [00:00<00:00, 46.0MB/s]
Benchmarking Training float precision type mnasnet1_0 
mnasnet1_0 model average train time: 58.658761978149414 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=MNASNet1_3_Weights.IMAGENET1K_V1`. You can also use `weights=MNASNet1_3_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/mnasnet1_3-a4c69d6f.pth" to /home/chris/.cache/torch/hub/checkpoints/mnasnet1_3-a4c69d6f.pth
100%|███████████████████████████████████████████████████████████████████████████████████| 24.2M/24.2M [00:00<00:00, 50.4MB/s]
Benchmarking Training float precision type mnasnet1_3 
mnasnet1_3 model average train time: 71.83271884918213 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet101_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet101_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/resnet101-63fe2227.pth" to /home/chris/.cache/torch/hub/checkpoints/resnet101-63fe2227.pth
100%|█████████████████████████████████████████████████████████████████████████████████████| 171M/171M [00:02<00:00, 60.3MB/s]
Benchmarking Training float precision type resnet101 
resnet101 model average train time: 100.49461364746094 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet152_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet152_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/resnet152-394f9c45.pth" to /home/chris/.cache/torch/hub/checkpoints/resnet152-394f9c45.pth
100%|█████████████████████████████████████████████████████████████████████████████████████| 230M/230M [00:04<00:00, 55.5MB/s]
Benchmarking Training float precision type resnet152 
resnet152 model average train time: 136.10498905181885 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet18_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet18_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /home/chris/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
100%|███████████████████████████████████████████████████████████████████████████████████| 44.7M/44.7M [00:00<00:00, 53.8MB/s]
Benchmarking Training float precision type resnet18 
resnet18 model average train time: 27.424216270446777 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet34_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet34_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/resnet34-b627a593.pth" to /home/chris/.cache/torch/hub/checkpoints/resnet34-b627a593.pth
100%|███████████████████████████████████████████████████████████████████████████████████| 83.3M/83.3M [00:01<00:00, 59.4MB/s]
Benchmarking Training float precision type resnet34 
resnet34 model average train time: 36.80033206939697 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/resnet50-0676ba61.pth" to /home/chris/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth
100%|███████████████████████████████████████████████████████████████████████████████████| 97.8M/97.8M [00:01<00:00, 52.7MB/s]
Benchmarking Training float precision type resnet50 
resnet50 model average train time: 66.30579471588135 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNeXt101_32X8D_Weights.IMAGENET1K_V1`. You can also use `weights=ResNeXt101_32X8D_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/resnext101_32x8d-8ba56ff5.pth" to /home/chris/.cache/torch/hub/checkpoints/resnext101_32x8d-8ba56ff5.pth
100%|█████████████████████████████████████████████████████████████████████████████████████| 340M/340M [00:06<00:00, 54.3MB/s]
Benchmarking Training float precision type resnext101_32x8d 
resnext101_32x8d model average train time: 218.29908847808838 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNeXt101_64X4D_Weights.IMAGENET1K_V1`. You can also use `weights=ResNeXt101_64X4D_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/resnext101_64x4d-173b62eb.pth" to /home/chris/.cache/torch/hub/checkpoints/resnext101_64x4d-173b62eb.pth
100%|█████████████████████████████████████████████████████████████████████████████████████| 319M/319M [00:07<00:00, 42.5MB/s]
Benchmarking Training float precision type resnext101_64x4d 
resnext101_64x4d model average train time: 224.22865390777588 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNeXt50_32X4D_Weights.IMAGENET1K_V1`. You can also use `weights=ResNeXt50_32X4D_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/resnext50_32x4d-7cdf4587.pth" to /home/chris/.cache/torch/hub/checkpoints/resnext50_32x4d-7cdf4587.pth
100%|███████████████████████████████████████████████████████████████████████████████████| 95.8M/95.8M [00:01<00:00, 59.9MB/s]
Benchmarking Training float precision type resnext50_32x4d 
resnext50_32x4d model average train time: 88.41317653656006 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=Wide_ResNet101_2_Weights.IMAGENET1K_V1`. You can also use `weights=Wide_ResNet101_2_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/wide_resnet101_2-32ee1156.pth" to /home/chris/.cache/torch/hub/checkpoints/wide_resnet101_2-32ee1156.pth
100%|█████████████████████████████████████████████████████████████████████████████████████| 243M/243M [00:10<00:00, 23.2MB/s]
Benchmarking Training float precision type wide_resnet101_2 
wide_resnet101_2 model average train time: 177.27826595306396 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=Wide_ResNet50_2_Weights.IMAGENET1K_V1`. You can also use `weights=Wide_ResNet50_2_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/wide_resnet50_2-95faca4d.pth" to /home/chris/.cache/torch/hub/checkpoints/wide_resnet50_2-95faca4d.pth
100%|█████████████████████████████████████████████████████████████████████████████████████| 132M/132M [00:02<00:00, 59.2MB/s]
Benchmarking Training float precision type wide_resnet50_2 
wide_resnet50_2 model average train time: 108.69176387786865 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=DenseNet121_Weights.IMAGENET1K_V1`. You can also use `weights=DenseNet121_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/densenet121-a639ec97.pth" to /home/chris/.cache/torch/hub/checkpoints/densenet121-a639ec97.pth
100%|███████████████████████████████████████████████████████████████████████████████████| 30.8M/30.8M [00:00<00:00, 50.0MB/s]
Benchmarking Training float precision type densenet121 
densenet121 model average train time: 71.15086555480957 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=DenseNet161_Weights.IMAGENET1K_V1`. You can also use `weights=DenseNet161_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/densenet161-8d451a50.pth" to /home/chris/.cache/torch/hub/checkpoints/densenet161-8d451a50.pth
100%|█████████████████████████████████████████████████████████████████████████████████████| 110M/110M [00:01<00:00, 58.7MB/s]
Benchmarking Training float precision type densenet161 
densenet161 model average train time: 126.22828006744385 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=DenseNet169_Weights.IMAGENET1K_V1`. You can also use `weights=DenseNet169_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/densenet169-b2777c0a.pth" to /home/chris/.cache/torch/hub/checkpoints/densenet169-b2777c0a.pth
100%|███████████████████████████████████████████████████████████████████████████████████| 54.7M/54.7M [00:01<00:00, 54.1MB/s]
Benchmarking Training float precision type densenet169 
densenet169 model average train time: 89.88576889038086 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=DenseNet201_Weights.IMAGENET1K_V1`. You can also use `weights=DenseNet201_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/densenet201-c1103571.pth" to /home/chris/.cache/torch/hub/checkpoints/densenet201-c1103571.pth
100%|███████████████████████████████████████████████████████████████████████████████████| 77.4M/77.4M [00:01<00:00, 51.9MB/s]
Benchmarking Training float precision type densenet201 
densenet201 model average train time: 108.72264862060547 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=SqueezeNet1_0_Weights.IMAGENET1K_V1`. You can also use `weights=SqueezeNet1_0_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/squeezenet1_0-b66bff10.pth" to /home/chris/.cache/torch/hub/checkpoints/squeezenet1_0-b66bff10.pth
100%|███████████████████████████████████████████████████████████████████████████████████| 4.78M/4.78M [00:00<00:00, 29.3MB/s]
Benchmarking Training float precision type squeezenet1_0 
squeezenet1_0 model average train time: 27.741594314575195 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=SqueezeNet1_1_Weights.IMAGENET1K_V1`. You can also use `weights=SqueezeNet1_1_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/squeezenet1_1-b8a52dc0.pth" to /home/chris/.cache/torch/hub/checkpoints/squeezenet1_1-b8a52dc0.pth
100%|███████████████████████████████████████████████████████████████████████████████████| 4.73M/4.73M [00:00<00:00, 31.8MB/s]
Benchmarking Training float precision type squeezenet1_1 
squeezenet1_1 model average train time: 23.695998191833496 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=VGG11_Weights.IMAGENET1K_V1`. You can also use `weights=VGG11_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/vgg11-8a719046.pth" to /home/chris/.cache/torch/hub/checkpoints/vgg11-8a719046.pth
100%|█████████████████████████████████████████████████████████████████████████████████████| 507M/507M [00:08<00:00, 59.2MB/s]
Benchmarking Training float precision type vgg11 
vgg11 model average train time: 55.82943916320801 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=VGG11_BN_Weights.IMAGENET1K_V1`. You can also use `weights=VGG11_BN_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/vgg11_bn-6002323d.pth" to /home/chris/.cache/torch/hub/checkpoints/vgg11_bn-6002323d.pth
100%|█████████████████████████████████████████████████████████████████████████████████████| 507M/507M [00:08<00:00, 60.0MB/s]
Benchmarking Training float precision type vgg11_bn 
vgg11_bn model average train time: 61.39991760253906 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=VGG13_Weights.IMAGENET1K_V1`. You can also use `weights=VGG13_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/vgg13-19584684.pth" to /home/chris/.cache/torch/hub/checkpoints/vgg13-19584684.pth
100%|█████████████████████████████████████████████████████████████████████████████████████| 508M/508M [00:08<00:00, 61.0MB/s]
Benchmarking Training float precision type vgg13 
vgg13 model average train time: 86.30177974700928 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=VGG13_BN_Weights.IMAGENET1K_V1`. You can also use `weights=VGG13_BN_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/vgg13_bn-abd245e5.pth" to /home/chris/.cache/torch/hub/checkpoints/vgg13_bn-abd245e5.pth
100%|█████████████████████████████████████████████████████████████████████████████████████| 508M/508M [00:08<00:00, 59.8MB/s]
Benchmarking Training float precision type vgg13_bn 
vgg13_bn model average train time: 94.87710952758789 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=VGG16_Weights.IMAGENET1K_V1`. You can also use `weights=VGG16_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /home/chris/.cache/torch/hub/checkpoints/vgg16-397923af.pth
100%|█████████████████████████████████████████████████████████████████████████████████████| 528M/528M [00:09<00:00, 59.7MB/s]
Benchmarking Training float precision type vgg16 
vgg16 model average train time: 99.09570217132568 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=VGG16_BN_Weights.IMAGENET1K_V1`. You can also use `weights=VGG16_BN_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/vgg16_bn-6c64b313.pth" to /home/chris/.cache/torch/hub/checkpoints/vgg16_bn-6c64b313.pth
100%|█████████████████████████████████████████████████████████████████████████████████████| 528M/528M [00:09<00:00, 59.2MB/s]
Benchmarking Training float precision type vgg16_bn 
vgg16_bn model average train time: 109.16308403015137 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=VGG19_Weights.IMAGENET1K_V1`. You can also use `weights=VGG19_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/vgg19-dcbb9e9d.pth" to /home/chris/.cache/torch/hub/checkpoints/vgg19-dcbb9e9d.pth
100%|█████████████████████████████████████████████████████████████████████████████████████| 548M/548M [00:09<00:00, 59.9MB/s]
Benchmarking Training float precision type vgg19 
vgg19 model average train time: 112.7330207824707 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=VGG19_BN_Weights.IMAGENET1K_V1`. You can also use `weights=VGG19_BN_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/vgg19_bn-c79401a0.pth" to /home/chris/.cache/torch/hub/checkpoints/vgg19_bn-c79401a0.pth
100%|█████████████████████████████████████████████████████████████████████████████████████| 548M/548M [00:17<00:00, 33.1MB/s]
Benchmarking Training float precision type vgg19_bn 
vgg19_bn model average train time: 123.02981376647949 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=MobileNet_V2_Weights.IMAGENET1K_V1`. You can also use `weights=MobileNet_V2_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/mobilenet_v2-b0353104.pth" to /home/chris/.cache/torch/hub/checkpoints/mobilenet_v2-b0353104.pth
100%|███████████████████████████████████████████████████████████████████████████████████| 13.6M/13.6M [00:00<00:00, 46.7MB/s]
Benchmarking Training float precision type mobilenet_v2 
mobilenet_v2 model average train time: 66.61790370941162 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=MobileNet_V3_Large_Weights.IMAGENET1K_V1`. You can also use `weights=MobileNet_V3_Large_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/mobilenet_v3_large-8738ca79.pth" to /home/chris/.cache/torch/hub/checkpoints/mobilenet_v3_large-8738ca79.pth
100%|███████████████████████████████████████████████████████████████████████████████████| 21.1M/21.1M [00:00<00:00, 47.9MB/s]
Benchmarking Training float precision type mobilenet_v3_large 
mobilenet_v3_large model average train time: 58.378777503967285 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=MobileNet_V3_Small_Weights.IMAGENET1K_V1`. You can also use `weights=MobileNet_V3_Small_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/mobilenet_v3_small-047dcff4.pth" to /home/chris/.cache/torch/hub/checkpoints/mobilenet_v3_small-047dcff4.pth
100%|███████████████████████████████████████████████████████████████████████████████████| 9.83M/9.83M [00:00<00:00, 39.6MB/s]
Benchmarking Training float precision type mobilenet_v3_small 
mobilenet_v3_small model average train time: 34.53420639038086 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ShuffleNet_V2_X0_5_Weights.IMAGENET1K_V1`. You can also use `weights=ShuffleNet_V2_X0_5_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/shufflenetv2_x0.5-f707e7126e.pth" to /home/chris/.cache/torch/hub/checkpoints/shufflenetv2_x0.5-f707e7126e.pth
100%|███████████████████████████████████████████████████████████████████████████████████| 5.28M/5.28M [00:00<00:00, 9.26MB/s]
Benchmarking Training float precision type shufflenet_v2_x0_5 
shufflenet_v2_x0_5 model average train time: 32.43414878845215 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ShuffleNet_V2_X1_0_Weights.IMAGENET1K_V1`. You can also use `weights=ShuffleNet_V2_X1_0_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/shufflenetv2_x1-5666bf0f80.pth" to /home/chris/.cache/torch/hub/checkpoints/shufflenetv2_x1-5666bf0f80.pth
100%|███████████████████████████████████████████████████████████████████████████████████| 8.79M/8.79M [00:00<00:00, 39.5MB/s]
Benchmarking Training float precision type shufflenet_v2_x1_0 
shufflenet_v2_x1_0 model average train time: 35.66880702972412 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ShuffleNet_V2_X1_5_Weights.IMAGENET1K_V1`. You can also use `weights=ShuffleNet_V2_X1_5_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/shufflenetv2_x1_5-3c479a10.pth" to /home/chris/.cache/torch/hub/checkpoints/shufflenetv2_x1_5-3c479a10.pth
100%|███████████████████████████████████████████████████████████████████████████████████| 13.6M/13.6M [00:00<00:00, 44.2MB/s]
Benchmarking Training float precision type shufflenet_v2_x1_5 
shufflenet_v2_x1_5 model average train time: 39.52351093292236 ms
/opt/rocm_sdk_611/lib/python3.9/site-packages/torchvision-0.18.0a0+cb3841e-py3.9-linux-x86_64.egg/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ShuffleNet_V2_X2_0_Weights.IMAGENET1K_V1`. You can also use `weights=ShuffleNet_V2_X2_0_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/shufflenetv2_x2_0-8be3c8ee.pth" to /home/chris/.cache/torch/hub/checkpoints/shufflenetv2_x2_0-8be3c8ee.pth
100%|███████████████████████████████████████████████████████████████████████████████████| 28.4M/28.4M [00:01<00:00, 26.8MB/s]
Benchmarking Training float precision type shufflenet_v2_x2_0 
shufflenet_v2_x2_0 model average train time: 47.846031188964844 ms
Benchmarking Inference float precision type mnasnet0_5 
mnasnet0_5 model average inference time: 7.539019584655762 ms
Benchmarking Inference float precision type mnasnet0_75 
mnasnet0_75 model average inference time: 9.25621509552002 ms
Benchmarking Inference float precision type mnasnet1_0 
mnasnet1_0 model average inference time: 10.612540245056152 ms
Benchmarking Inference float precision type mnasnet1_3 
mnasnet1_3 model average inference time: 13.121509552001953 ms
Benchmarking Inference float precision type resnet101 
resnet101 model average inference time: 25.301389694213867 ms
Benchmarking Inference float precision type resnet152 
resnet152 model average inference time: 35.6933069229126 ms
Benchmarking Inference float precision type resnet18 
resnet18 model average inference time: 5.026726722717285 ms
Benchmarking Inference float precision type resnet34 
resnet34 model average inference time: 7.063908576965332 ms
Benchmarking Inference float precision type resnet50 
resnet50 model average inference time: 15.245218276977539 ms
Benchmarking Inference float precision type resnext101_32x8d 
resnext101_32x8d model average inference time: 63.03804397583008 ms
Benchmarking Inference float precision type resnext101_64x4d 
resnext101_64x4d model average inference time: 64.46457386016846 ms
Benchmarking Inference float precision type resnext50_32x4d 
resnext50_32x4d model average inference time: 21.462855339050293 ms
Benchmarking Inference float precision type wide_resnet101_2 
wide_resnet101_2 model average inference time: 47.09118366241455 ms
Benchmarking Inference float precision type wide_resnet50_2 
wide_resnet50_2 model average inference time: 26.841869354248047 ms
Benchmarking Inference float precision type densenet121 
densenet121 model average inference time: 16.41010284423828 ms
Benchmarking Inference float precision type densenet161 
densenet161 model average inference time: 35.70540904998779 ms
Benchmarking Inference float precision type densenet169 
densenet169 model average inference time: 22.435436248779297 ms
Benchmarking Inference float precision type densenet201 
densenet201 model average inference time: 29.548978805541992 ms
Benchmarking Inference float precision type squeezenet1_0 
squeezenet1_0 model average inference time: 6.014914512634277 ms
Benchmarking Inference float precision type squeezenet1_1 
squeezenet1_1 model average inference time: 4.7013044357299805 ms
Benchmarking Inference float precision type vgg11 
vgg11 model average inference time: 16.962571144104004 ms
Benchmarking Inference float precision type vgg11_bn 
vgg11_bn model average inference time: 18.56998920440674 ms
Benchmarking Inference float precision type vgg13 
vgg13 model average inference time: 20.87292194366455 ms
Benchmarking Inference float precision type vgg13_bn 
vgg13_bn model average inference time: 23.599228858947754 ms
Benchmarking Inference float precision type vgg16 
vgg16 model average inference time: 23.992109298706055 ms
Benchmarking Inference float precision type vgg16_bn 
vgg16_bn model average inference time: 26.942858695983887 ms
Benchmarking Inference float precision type vgg19 
vgg19 model average inference time: 27.131919860839844 ms
Benchmarking Inference float precision type vgg19_bn 
vgg19_bn model average inference time: 30.173110961914062 ms
Benchmarking Inference float precision type mobilenet_v2 
mobilenet_v2 model average inference time: 10.320515632629395 ms
Benchmarking Inference float precision type mobilenet_v3_large 
mobilenet_v3_large model average inference time: 9.284696578979492 ms
Benchmarking Inference float precision type mobilenet_v3_small 
mobilenet_v3_small model average inference time: 7.71611213684082 ms
Benchmarking Inference float precision type shufflenet_v2_x0_5 
shufflenet_v2_x0_5 model average inference time: 8.984808921813965 ms
Benchmarking Inference float precision type shufflenet_v2_x1_0 
shufflenet_v2_x1_0 model average inference time: 8.944554328918457 ms
Benchmarking Inference float precision type shufflenet_v2_x1_5 
shufflenet_v2_x1_5 model average inference time: 8.789167404174805 ms
Benchmarking Inference float precision type shufflenet_v2_x2_0 
shufflenet_v2_x2_0 model average inference time: 9.502840042114258 ms
Benchmarking Training half precision type mnasnet0_5 
mnasnet0_5 model average train time: 32.37814426422119 ms
Benchmarking Training half precision type mnasnet0_75 
mnasnet0_75 model average train time: 35.12150764465332 ms
Benchmarking Training half precision type mnasnet1_0 
mnasnet1_0 model average train time: 37.83927917480469 ms
Benchmarking Training half precision type mnasnet1_3 
mnasnet1_3 model average train time: 44.91675853729248 ms
Benchmarking Training half precision type resnet101 
resnet101 model average train time: 54.510536193847656 ms
Benchmarking Training half precision type resnet152 
resnet152 model average train time: 77.83138275146484 ms
Benchmarking Training half precision type resnet18 
resnet18 model average train time: 16.26044273376465 ms
Benchmarking Training half precision type resnet34 
resnet34 model average train time: 22.654643058776855 ms
Benchmarking Training half precision type resnet50 
resnet50 model average train time: 32.530155181884766 ms
Benchmarking Training half precision type resnext101_32x8d 
resnext101_32x8d model average train time: 89.58632469177246 ms
Benchmarking Training half precision type resnext101_64x4d 
resnext101_64x4d model average train time: 94.55410957336426 ms
Benchmarking Training half precision type resnext50_32x4d 
resnext50_32x4d model average train time: 40.16225337982178 ms
Benchmarking Training half precision type wide_resnet101_2 
wide_resnet101_2 model average train time: 84.22202110290527 ms
Benchmarking Training half precision type wide_resnet50_2 
wide_resnet50_2 model average train time: 49.42116737365723 ms
Benchmarking Training half precision type densenet121 
densenet121 model average train time: 58.374996185302734 ms
Benchmarking Training half precision type densenet161 
densenet161 model average train time: 84.11832809448242 ms
Benchmarking Training half precision type densenet169 
densenet169 model average train time: 82.41623401641846 ms
Benchmarking Training half precision type densenet201 
densenet201 model average train time: 100.52793979644775 ms
Benchmarking Training half precision type squeezenet1_0 
squeezenet1_0 model average train time: 15.914254188537598 ms
Benchmarking Training half precision type squeezenet1_1 
squeezenet1_1 model average train time: 14.700212478637695 ms
Benchmarking Training half precision type vgg11 
vgg11 model average train time: 28.235321044921875 ms
Benchmarking Training half precision type vgg11_bn 
vgg11_bn model average train time: 33.023743629455566 ms
Benchmarking Training half precision type vgg13 
vgg13 model average train time: 42.78169631958008 ms
Benchmarking Training half precision type vgg13_bn 
vgg13_bn model average train time: 51.15112781524658 ms
Benchmarking Training half precision type vgg16 
vgg16 model average train time: 50.44646739959717 ms
Benchmarking Training half precision type vgg16_bn 
vgg16_bn model average train time: 59.71973896026611 ms
Benchmarking Training half precision type vgg19 
vgg19 model average train time: 58.3311653137207 ms
Benchmarking Training half precision type vgg19_bn 
vgg19_bn model average train time: 68.45820426940918 ms
Benchmarking Training half precision type mobilenet_v2 
mobilenet_v2 model average train time: 38.55362415313721 ms
Benchmarking Training half precision type mobilenet_v3_large 
mobilenet_v3_large model average train time: 35.7493782043457 ms
Benchmarking Training half precision type mobilenet_v3_small 
mobilenet_v3_small model average train time: 25.366668701171875 ms
Benchmarking Training half precision type shufflenet_v2_x0_5 
shufflenet_v2_x0_5 model average train time: 27.322239875793457 ms
Benchmarking Training half precision type shufflenet_v2_x1_0 
shufflenet_v2_x1_0 model average train time: 26.57792091369629 ms
Benchmarking Training half precision type shufflenet_v2_x1_5 
shufflenet_v2_x1_5 model average train time: 29.904890060424805 ms
Benchmarking Training half precision type shufflenet_v2_x2_0 
shufflenet_v2_x2_0 model average train time: 32.880730628967285 ms
Benchmarking Inference half precision type mnasnet0_5 
mnasnet0_5 model average inference time: 8.785343170166016 ms
Benchmarking Inference half precision type mnasnet0_75 
mnasnet0_75 model average inference time: 9.158530235290527 ms
Benchmarking Inference half precision type mnasnet1_0 
mnasnet1_0 model average inference time: 10.129733085632324 ms
Benchmarking Inference half precision type mnasnet1_3 
mnasnet1_3 model average inference time: 12.214388847351074 ms
Benchmarking Inference half precision type resnet101 
resnet101 model average inference time: 16.435775756835938 ms
Benchmarking Inference half precision type resnet152 
resnet152 model average inference time: 21.902685165405273 ms
Benchmarking Inference half precision type resnet18 
resnet18 model average inference time: 4.993476867675781 ms
Benchmarking Inference half precision type resnet34 
resnet34 model average inference time: 6.620931625366211 ms
Benchmarking Inference half precision type resnet50 
resnet50 model average inference time: 10.160741806030273 ms
Benchmarking Inference half precision type resnext101_32x8d 
resnext101_32x8d model average inference time: 26.776127815246582 ms
Benchmarking Inference half precision type resnext101_64x4d 
resnext101_64x4d model average inference time: 28.269410133361816 ms
Benchmarking Inference half precision type resnext50_32x4d 
resnext50_32x4d model average inference time: 11.387176513671875 ms
Benchmarking Inference half precision type wide_resnet101_2 
wide_resnet101_2 model average inference time: 23.84159564971924 ms
Benchmarking Inference half precision type wide_resnet50_2 
wide_resnet50_2 model average inference time: 14.216933250427246 ms
Benchmarking Inference half precision type densenet121 
densenet121 model average inference time: 20.02528190612793 ms
Benchmarking Inference half precision type densenet161 
densenet161 model average inference time: 25.49595355987549 ms
Benchmarking Inference half precision type densenet169 
densenet169 model average inference time: 26.494426727294922 ms
Benchmarking Inference half precision type densenet201 
densenet201 model average inference time: 29.19358253479004 ms
Benchmarking Inference half precision type squeezenet1_0 
squeezenet1_0 model average inference time: 4.437603950500488 ms
Benchmarking Inference half precision type squeezenet1_1 
squeezenet1_1 model average inference time: 4.184565544128418 ms
Benchmarking Inference half precision type vgg11 
vgg11 model average inference time: 6.856842041015625 ms
Benchmarking Inference half precision type vgg11_bn 
vgg11_bn model average inference time: 8.037772178649902 ms
Benchmarking Inference half precision type vgg13 
vgg13 model average inference time: 9.449577331542969 ms
Benchmarking Inference half precision type vgg13_bn 
vgg13_bn model average inference time: 10.709524154663086 ms
Benchmarking Inference half precision type vgg16 
vgg16 model average inference time: 11.3215970993042 ms
Benchmarking Inference half precision type vgg16_bn 
vgg16_bn model average inference time: 13.036413192749023 ms
Benchmarking Inference half precision type vgg19 
vgg19 model average inference time: 13.724384307861328 ms
Benchmarking Inference half precision type vgg19_bn 
vgg19_bn model average inference time: 14.952735900878906 ms
Benchmarking Inference half precision type mobilenet_v2 
mobilenet_v2 model average inference time: 10.579729080200195 ms
Benchmarking Inference half precision type mobilenet_v3_large 
mobilenet_v3_large model average inference time: 11.342415809631348 ms
Benchmarking Inference half precision type mobilenet_v3_small 
mobilenet_v3_small model average inference time: 8.617777824401855 ms
Benchmarking Inference half precision type shufflenet_v2_x0_5 
shufflenet_v2_x0_5 model average inference time: 11.478819847106934 ms
Benchmarking Inference half precision type shufflenet_v2_x1_0 
shufflenet_v2_x1_0 model average inference time: 10.977005958557129 ms
Benchmarking Inference half precision type shufflenet_v2_x1_5 
shufflenet_v2_x1_5 model average inference time: 10.728583335876465 ms
Benchmarking Inference half precision type shufflenet_v2_x2_0 
shufflenet_v2_x2_0 model average inference time: 10.733203887939453 ms
Benchmarking Training double precision type mnasnet0_5 
mnasnet0_5 model average train time: 151.3709545135498 ms
Benchmarking Training double precision type mnasnet0_75 
mnasnet0_75 model average train time: 174.9580478668213 ms
Benchmarking Training double precision type mnasnet1_0 
mnasnet1_0 model average train time: 196.662859916687 ms
Benchmarking Training double precision type mnasnet1_3 
mnasnet1_3 model average train time: 237.61310577392578 ms
Benchmarking Training double precision type resnet101 
resnet101 model average train time: 1339.5698308944702 ms
Benchmarking Training double precision type resnet152 
resnet152 model average train time: 1996.1229085922241 ms
Benchmarking Training double precision type resnet18 
resnet18 model average train time: 353.38467597961426 ms
Benchmarking Training double precision type resnet34 
resnet34 model average train time: 678.1932067871094 ms
Benchmarking Training double precision type resnet50 
resnet50 model average train time: 816.9629764556885 ms
Benchmarking Training double precision type resnext101_32x8d 
resnext101_32x8d model average train time: 5286.7921686172485 ms
Benchmarking Training double precision type resnext101_64x4d 
resnext101_64x4d model average train time: 6684.284801483154 ms
Benchmarking Training double precision type resnext50_32x4d 
resnext50_32x4d model average train time: 1747.278938293457 ms
Benchmarking Training double precision type wide_resnet101_2 
wide_resnet101_2 model average train time: 3406.943120956421 ms
Benchmarking Training double precision type wide_resnet50_2 
wide_resnet50_2 model average train time: 1745.1583099365234 ms
Benchmarking Training double precision type densenet121 
densenet121 model average train time: 694.7871255874634 ms
Benchmarking Training double precision type densenet161 
densenet161 model average train time: 1618.7607526779175 ms
Benchmarking Training double precision type densenet169 
densenet169 model average train time: 911.8060541152954 ms
Benchmarking Training double precision type densenet201 
densenet201 model average train time: 1153.2826471328735 ms
Benchmarking Training double precision type squeezenet1_0 
squeezenet1_0 model average train time: 194.09780979156494 ms
Benchmarking Training double precision type squeezenet1_1 
squeezenet1_1 model average train time: 127.93455600738525 ms
Benchmarking Training double precision type vgg11 
vgg11 model average train time: 1160.0871133804321 ms
Benchmarking Training double precision type vgg11_bn 
vgg11_bn model average train time: 1227.0143222808838 ms
Benchmarking Training double precision type vgg13 
vgg13 model average train time: 1898.6948347091675 ms
Benchmarking Training double precision type vgg13_bn 
vgg13_bn model average train time: 2183.8944911956787 ms
Benchmarking Training double precision type vgg16 
vgg16 model average train time: 2806.0734701156616 ms
Benchmarking Training double precision type vgg16_bn 
vgg16_bn model average train time: 3539.889979362488 ms
Benchmarking Training double precision type vgg19 
vgg19 model average train time: 2674.389033317566 ms
Benchmarking Training double precision type vgg19_bn 
vgg19_bn model average train time: 2694.9081563949585 ms
Benchmarking Training double precision type mobilenet_v2 
mobilenet_v2 model average train time: 183.0529499053955 ms
Benchmarking Training double precision type mobilenet_v3_large 
mobilenet_v3_large model average train time: 181.79774284362793 ms
Benchmarking Training double precision type mobilenet_v3_small 
mobilenet_v3_small model average train time: 86.08907222747803 ms
Benchmarking Training double precision type shufflenet_v2_x0_5 
shufflenet_v2_x0_5 model average train time: 77.25229740142822 ms
Benchmarking Training double precision type shufflenet_v2_x1_0 
shufflenet_v2_x1_0 model average train time: 92.7445650100708 ms
Benchmarking Training double precision type shufflenet_v2_x1_5 
shufflenet_v2_x1_5 model average train time: 122.03961849212646 ms
Benchmarking Training double precision type shufflenet_v2_x2_0 
shufflenet_v2_x2_0 model average train time: 177.19773769378662 ms
Benchmarking Inference double precision type mnasnet0_5 
mnasnet0_5 model average inference time: 20.706510543823242 ms
Benchmarking Inference double precision type mnasnet0_75 
mnasnet0_75 model average inference time: 27.89578914642334 ms
Benchmarking Inference double precision type mnasnet1_0 
mnasnet1_0 model average inference time: 35.084829330444336 ms
Benchmarking Inference double precision type mnasnet1_3 
mnasnet1_3 model average inference time: 48.00275802612305 ms
Benchmarking Inference double precision type resnet101 
resnet101 model average inference time: 455.0099229812622 ms
Benchmarking Inference double precision type resnet152 
resnet152 model average inference time: 660.1268720626831 ms
Benchmarking Inference double precision type resnet18 
resnet18 model average inference time: 124.11296844482422 ms
Benchmarking Inference double precision type resnet34 
resnet34 model average inference time: 238.1240749359131 ms
Benchmarking Inference double precision type resnet50 
resnet50 model average inference time: 255.39941310882568 ms
Benchmarking Inference double precision type resnext101_32x8d 
resnext101_32x8d model average inference time: 1421.1232233047485 ms
Benchmarking Inference double precision type resnext101_64x4d 
resnext101_64x4d model average inference time: 1596.901683807373 ms
Benchmarking Inference double precision type resnext50_32x4d 
resnext50_32x4d model average inference time: 430.3028202056885 ms
Benchmarking Inference double precision type wide_resnet101_2 
wide_resnet101_2 model average inference time: 1245.8889770507812 ms
Benchmarking Inference double precision type wide_resnet50_2 
wide_resnet50_2 model average inference time: 654.6481561660767 ms
Benchmarking Inference double precision type densenet121 
densenet121 model average inference time: 279.40832138061523 ms
Benchmarking Inference double precision type densenet161 
densenet161 model average inference time: 702.8334426879883 ms
Benchmarking Inference double precision type densenet169 
densenet169 model average inference time: 395.0260257720947 ms
Benchmarking Inference double precision type densenet201 
densenet201 model average inference time: 496.9772481918335 ms
Benchmarking Inference double precision type squeezenet1_0 
squeezenet1_0 model average inference time: 54.285645484924316 ms
Benchmarking Inference double precision type squeezenet1_1 
squeezenet1_1 model average inference time: 30.408172607421875 ms
Benchmarking Inference double precision type vgg11 
vgg11 model average inference time: 399.6822500228882 ms
Benchmarking Inference double precision type vgg11_bn 
vgg11_bn model average inference time: 401.1578845977783 ms
Benchmarking Inference double precision type vgg13 
vgg13 model average inference time: 581.4366245269775 ms
Benchmarking Inference double precision type vgg13_bn 
vgg13_bn model average inference time: 555.940113067627 ms
Benchmarking Inference double precision type vgg16 
vgg16 model average inference time: 690.791220664978 ms
Benchmarking Inference double precision type vgg16_bn 
vgg16_bn model average inference time: 691.2011337280273 ms
Benchmarking Inference double precision type vgg19 
vgg19 model average inference time: 863.5865783691406 ms
Benchmarking Inference double precision type vgg19_bn 
vgg19_bn model average inference time: 869.0767955780029 ms
Benchmarking Inference double precision type mobilenet_v2 
mobilenet_v2 model average inference time: 33.045854568481445 ms
Benchmarking Inference double precision type mobilenet_v3_large 
mobilenet_v3_large model average inference time: 34.652090072631836 ms
Benchmarking Inference double precision type mobilenet_v3_small 
mobilenet_v3_small model average inference time: 19.62940216064453 ms
Benchmarking Inference double precision type shufflenet_v2_x0_5 
shufflenet_v2_x0_5 model average inference time: 17.136335372924805 ms
Benchmarking Inference double precision type shufflenet_v2_x1_0 
shufflenet_v2_x1_0 model average inference time: 20.188417434692383 ms
Benchmarking Inference double precision type shufflenet_v2_x1_5 
shufflenet_v2_x1_5 model average inference time: 29.74188804626465 ms
Benchmarking Inference double precision type shufflenet_v2_x2_0 
shufflenet_v2_x2_0 model average inference time: 48.07080268859863 ms
benchmark end : 2024/06/04 13:43:06
end
nvidia-smi not installed
lamikr commented 3 weeks ago

This shoud be fixed by pull request https://github.com/lamikr/rocm_sdk_builder/pull/59

lamikr commented 3 weeks ago

Hi, all the tests should now work. I had now also time to do the gpu_benchmark and checked that the upstream version has also been fixed to run on pytorch 2.0. I got all test executed on it without any of the test throwing any errors. I only needed to change the test.sh.

I will check how to get the gpu benchmark results visualized, so lets close this bug and open new discussion one for benchmark result visualization and possible problems on example codes. (which should now all work with latest version)