lamikr / rocm_sdk_builder

Other
113 stars 8 forks source link

onnxruntime error building for 6.1.2: '...allocated_capacity' is unused uninitialized #71

Closed jeroen-mostert closed 1 week ago

jeroen-mostert commented 2 weeks ago

Attempting to build the wip/rocm_sdk_builder_612 branch with the Python patch from https://github.com/lamikr/rocm_sdk_builder/pull/70 applied (which is likely unrelated, but mentioned for completeness) on a fully up-to-date Manjaro unstable produces a peculiar error that I've found myself unable to troubleshoot. The header path (/usr/include/absl) seems to indicate it's using a global include file from the system's GCC, rather than clang or hipcc; I'm not sure whether that's intended or not.

And it's been a while since I've written C++ but the error itself is a mystery to me as well: it seems to complain about an uninitialized member being present when invoking a copy constructor, but the instances are being initialized using the default constructor.

This is gcc 14.1.1 20240522.

In file included from /usr/include/absl/container/inlined_vector.h:53,
                 from /home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/include/onnxruntime/core/common/inlined_containers_fwd.h:25,
                 from /home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/include/onnxruntime/core/framework/tensor_shape.h:13,
                 from /home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/include/onnxruntime/core/framework/tensor.h:15,
                 from /home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/test/providers/cpu/ml/write_scores_test.cc:5:
In member function ‘void absl::lts_20240116::inlined_vector_internal::Storage<T, N, A>::MemcpyFrom(const absl::lts_20240116::inlined_vector_internal::Storage<T, N, A>&) [with T = float; long unsigned int N = 11; A = std::allocator<float>]’,
    inlined from ‘absl::lts_20240116::InlinedVector<T, N, A>::InlinedVector(const absl::lts_20240116::InlinedVector<T, N, A>&, const allocator_type&) [with T = float; long unsigned int N = 11; A = std::allocator<float>]’ at /usr/include/absl/container/inlined_vector.h:195:26,
    inlined from ‘absl::lts_20240116::InlinedVector<T, N, A>::InlinedVector(const absl::lts_20240116::InlinedVector<T, N, A>&) [with T = float; long unsigned int N = 11; A = std::allocator<float>]’ at /usr/include/absl/container/inlined_vector.h:177:59,
    inlined from ‘virtual void WriteScores_single_score_transform_none_Test::TestBody()’ at /home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/test/providers/cpu/ml/write_scores_test.cc:61:29:
/usr/include/absl/container/internal/inlined_vector.h:532:5: error: ‘v1.absl::lts_20240116::InlinedVector<float, 11, std::allocator<float> >::storage_.absl::lts_20240116::inlined_vector_internal::Storage<float, 11, std::allocator<float> >::data_.absl::lts_20240116::inlined_vector_internal::Storage<float, 11, std::allocator<float> >::Data::allocated.absl::lts_20240116::inlined_vector_internal::Storage<float, 11, std::allocator<float> >::Allocated::allocated_capacity’ is used uninitialized [-Werror=uninitialized]
  532 |     data_ = other_storage.data_;
      |     ^~~~~
/home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/test/providers/cpu/ml/write_scores_test.cc: In member function ‘virtual void WriteScores_single_score_transform_none_Test::TestBody()’:
/home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/test/providers/cpu/ml/write_scores_test.cc:58:24: note: ‘v1’ declared here
   58 |   InlinedVector<float> v1;
      |                        ^~
cc1plus: all warnings being treated as errors
make[2]: *** [CMakeFiles/onnxruntime_test_all.dir/build.make:3296: CMakeFiles/onnxruntime_test_all.dir/home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/test/providers/cpu/ml/write_scores_test.cc.o] Error 1
make[2]: *** Waiting for unfinished jobs....
In file included from /usr/include/absl/container/inlined_vector.h:53,
                 from /home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/include/onnxruntime/core/common/inlined_containers_fwd.h:25,
                 from /home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/include/onnxruntime/core/framework/tensor_shape.h:13,
                 from /home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/include/onnxruntime/core/framework/tensor.h:15,
                 from /home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/test/common/tensor_op_test_utils.h:16,
                 from /home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/test/providers/cpu/reduction/reduction_ops_test.cc:9:
In member function ‘void absl::lts_20240116::inlined_vector_internal::Storage<T, N, A>::MemcpyFrom(const absl::lts_20240116::inlined_vector_internal::Storage<T, N, A>&) [with T = long int; long unsigned int N = 6; A = std::allocator<long int>]’,
    inlined from ‘void absl::lts_20240116::InlinedVector<T, N, A>::MoveAssignment(MemcpyPolicy, absl::lts_20240116::InlinedVector<T, N, A>&&) [with T = long int; long unsigned int N = 6; A = std::allocator<long int>]’ at /usr/include/absl/container/inlined_vector.h:856:24,
    inlined from ‘absl::lts_20240116::InlinedVector<T, N, A>& absl::lts_20240116::InlinedVector<T, N, A>::operator=(absl::lts_20240116::InlinedVector<T, N, A>&&) [with T = long int; long unsigned int N = 6; A = std::allocator<long int>]’ at /usr/include/absl/container/inlined_vector.h:548:21,
    inlined from ‘virtual void onnxruntime::test::ReductionOpTest_OptimizeShapeForFastReduce_KR_neg_Test::TestBody()’ at /home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/test/providers/cpu/reduction/reduction_ops_test.cc:4255:43:
/usr/include/absl/container/internal/inlined_vector.h:532:5: error: ‘expected_fast_axes.absl::lts_20240116::InlinedVector<long int, 6, std::allocator<long int> >::storage_.absl::lts_20240116::inlined_vector_internal::Storage<long int, 6, std::allocator<long int> >::data_.absl::lts_20240116::inlined_vector_internal::Storage<long int, 6, std::allocator<long int> >::Data::allocated.absl::lts_20240116::inlined_vector_internal::Storage<long int, 6, std::allocator<long int> >::Allocated::allocated_capacity’ is used uninitialized [-Werror=uninitialized]
  532 |     data_ = other_storage.data_;
      |     ^~~~~
/home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/test/providers/cpu/reduction/reduction_ops_test.cc: In member function ‘virtual void onnxruntime::test::ReductionOpTest_OptimizeShapeForFastReduce_KR_neg_Test::TestBody()’:
/home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/test/providers/cpu/reduction/reduction_ops_test.cc:4247:70: note: ‘expected_fast_axes’ declared here
 4247 |   TensorShapeVector expected_fast_shape, expected_fast_output_shape, expected_fast_axes;
      |                                                                      ^~~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
make[2]: *** [CMakeFiles/onnxruntime_test_all.dir/build.make:3618: CMakeFiles/onnxruntime_test_all.dir/home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/test/providers/cpu/reduction/reduction_ops_test.cc.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:3251: CMakeFiles/onnxruntime_test_all.dir/all] Error 2
make: *** [Makefile:146: all] Error 2
Traceback (most recent call last):
  File "/home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/tools/ci_build/build.py", line 2950, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/tools/ci_build/build.py", line 2842, in main
    build_targets(args, cmake_path, build_dir, configs, num_parallel_jobs, args.target)
  File "/home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/tools/ci_build/build.py", line 1731, in build_targets
    run_subprocess(cmd_args, env=env)
  File "/home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/tools/ci_build/build.py", line 861, in run_subprocess
    return run(*args, cwd=cwd, capture_stdout=capture_stdout, shell=shell, env=my_env)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/tools/python/util/run.py", line 49, in run
    completed_process = subprocess.run(
                        ^^^^^^^^^^^^^^^
  File "/opt/rocm_sdk_612/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/usr/bin/cmake', '--build', '/home/jeroen/rocm_sdk_builder/src_projects/onnxruntime/build/Linux/Release', '--config', 'Release', '--', '-j12']' returned non-zero exit status 2.
build failed: onnxruntime
  error in build cmd: ./build_onnxruntime_rocm_training.sh /opt/rocm_sdk_612 "gfx1030"
Build failed
jeroen-mostert commented 2 weeks ago

The good news, if you can call it that, is that a clean pull of the onnxruntime repo built with build.sh (so no special options and not enabling ROCm) fails as well, but with different errors than the above. Unfortunately onnxruntime's own internal build system is a bit of an ogre, so troubleshooting it is a pain. There are no apparent relevant issues in the upstream repo that look relevant.

Update: after adding -Wno-template-id-cdtor and adding the patch for the _M_Manager warning a vanilla build fails with the same error, invoking as build.sh --config Release --enable_training --build_wheel --skip_tests --build_shared_lib. This at least show the issue is not due to the ROCm-specific bits or SDK patches. The build succeeds with --config Debug, but that of course says little about the validity of the warning.

jeroen-mostert commented 1 week ago

Well shucks. I was just about to prepare a patch when I noticed @lamikr beat me to the punch and I could have saved myself the trouble. :P Patch 9 for onnxruntime fixes this. I do have a suggestion on how to make it better, but that'll be a separate pull. Closing this.