intel / pti-gpu

Profiling Tools Interfaces for GPU (PTI for GPU) is a set of Getting Started Documentation and Tools Library to start performance analysis on Intel(R) Processor Graphics easily
MIT License
202 stars 57 forks source link

ze_tracer/onetrace: Assertion `call->command != command' failed with simple SYCL Graph application #49

Closed al42and closed 4 months ago

al42and commented 1 year ago

Trying to trace a simple application which uses SYCL Graphs with ze_tracer or onetrace triggers an internal failed assertion.

$ clang++ -fsycl -g test_graph_zetrace.cpp -o test_graph_zetrace

$ ONEAPI_DEVICE_SELECTOR=level_zero:0 ~/pti-gpu/tools/ze_tracer/build/ze_tracer ./test_graph_zetrace
Intel(R) Arc(TM) A770 Graphics : native
   Done!
test_graph_zetrace: /home/aland/pti-gpu/tools/ze_tracer/ze_kernel_collector.h:1041: void ZeKernelCollector::RemoveKernelCommands(ze_command_list_handle_t): Assertion `call->command != command' failed.
Aborted (core dumped)

Ubuntu Linux 22.04 (6.2.0-36-generic), Intel Compute Runtime 23.30.26918.9, recent Intel LLVM built from source (a2f02214200ef71d3a8ec6cae1b84a16508513c4), PTI-GPU 90b9230c8bd9c00211934ec5e36183edc3aa8c1d.

Source code:

#include <sycl/sycl.hpp>

namespace syclex = sycl::ext::oneapi::experimental;

int main() {
  for (const auto &dev : sycl::device::get_devices()) {
    using graph_support = syclex::info::device::graph_support;
    using gsl = syclex::graph_support_level;
    const auto gs = dev.get_info<graph_support>();
    std::cout << dev.get_info<sycl::info::device::name>() << " : "
              << (gs == gsl::unsupported
                      ? "unsupported"
                      : (gs == gsl::emulated ? "emulated" : "native"))
              << std::endl;
    if (gs != gsl::unsupported) {
      sycl::context ctx{dev};
      sycl::queue q1{ctx, dev, {sycl::property::queue::in_order()}};
      std::vector<sycl::queue> queuesToRecord{q1};

      const sycl::property_list propList{syclex::property::graph::no_cycle_check()};
      syclex::command_graph<syclex::graph_state::modifiable> graph(ctx, dev, propList);

      int *value_h = sycl::malloc_host<int>(1, ctx);
      int *value_i = sycl::malloc_device<int>(1, dev, ctx);
      int *value_o = sycl::malloc_device<int>(1, dev, ctx);

      value_h[0] = 1;

      q1.memcpy(value_i, value_h, 1 * sizeof(int)).wait_and_throw();

      bool result = graph.begin_recording(queuesToRecord);
      if (!result) {
        std::cout << "  Could not start the recording" << std::endl;
      }

      q1.submit([&](sycl::handler &cgh) {
        cgh.single_task<class Memset>([=]() { value_o[0] = 0; });
      });
      q1.submit([&](sycl::handler &cgh) {
        cgh.single_task<class Memcpy>([=]() { value_i[0] = value_o[0]; });
      });

      graph.end_recording();
      auto instance = graph.finalize();

      q1.ext_oneapi_graph(instance).wait_and_throw();
      std::cout << "   Done!" << std::endl;
      q1.wait_and_throw();
    } // Here it dies when destroying `instance`
  }
  std::cout << "Done!" << std::endl;
  return 0;
}

Stack trace:

(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352309824) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737352309824) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737352309824, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff2642476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff26287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff262871b in __assert_fail_base (fmt=0x7ffff27dd150 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x7ffff7fababd "call->command != command", file=0x7ffff7f9dc20 "/home/aland/pti-gpu/tools/ze_tracer/ze_kernel_collector.h", line=1041, function=<optimized out>) at ./assert/assert.c:92
#6  0x00007ffff2639e96 in __GI___assert_fail (assertion=0x7ffff7fababd "call->command != command", file=0x7ffff7f9dc20 "/home/aland/pti-gpu/tools/ze_tracer/ze_kernel_collector.h", line=1041, function=0x7ffff7faba38 "void ZeKernelCollector::RemoveKernelCommands(ze_command_list_handle_t)") at ./assert/assert.c:101
#7  0x00007ffff7f9c2df in ZeKernelCollector::OnExitCommandListDestroy(_ze_command_list_destroy_params_t*, _ze_result_t, void*, void**) () from /home/aland/pti-gpu/tools/ze_tracer/build/libzet_tracer.so
#8  0x00007ffff7ac8dc5 in tracing_layer::zeCommandListDestroy(_ze_command_list_handle_t*) () from /home/aland/intel-sycl/llvm/build/install//lib/libze_tracing_layer.so.1
#9  0x00007ffff00229e2 in ur_exp_command_buffer_handle_t_::~ur_exp_command_buffer_handle_t_() () from /home/aland/intel-sycl/llvm/build/install/lib/libpi_level_zero.so
#10 0x00007ffff0023302 in urCommandBufferReleaseExp () from /home/aland/intel-sycl/llvm/build/install/lib/libpi_level_zero.so
#11 0x00007ffff008805d in piextCommandBufferRelease () from /home/aland/intel-sycl/llvm/build/install/lib/libpi_level_zero.so
#12 0x00007ffff2f2ac87 in sycl::_V1::ext::oneapi::experimental::detail::exec_graph_impl::~exec_graph_impl() () from /home/aland/intel-sycl/llvm/build/install//lib/libsycl.so.7
#13 0x0000000000406b8e in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x18ca3b0) at /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr_base.h:346
#14 0x0000000000406b0a in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7fffffffd4a0) at /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr_base.h:1071
#15 0x0000000000407729 in std::__shared_ptr<sycl::_V1::ext::oneapi::experimental::detail::exec_graph_impl, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7fffffffd498) at /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr_base.h:1524
#16 0x0000000000407705 in std::shared_ptr<sycl::_V1::ext::oneapi::experimental::detail::exec_graph_impl>::~shared_ptr (this=0x7fffffffd498) at /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr.h:175
#17 0x00000000004076e5 in sycl::_V1::ext::oneapi::experimental::detail::executable_command_graph::~executable_command_graph (this=0x7fffffffd498) at /home/aland/intel-sycl/llvm/build/install/bin/../include/sycl/ext/oneapi/experimental/graph.hpp:289
#18 0x0000000000406755 in sycl::_V1::ext::oneapi::experimental::command_graph<(sycl::_V1::ext::oneapi::experimental::graph_state)1>::~command_graph (this=0x7fffffffd498) at /home/aland/intel-sycl/llvm/build/install/bin/../include/sycl/ext/oneapi/experimental/graph.hpp:336
#19 0x0000000000403fe1 in main () at test_graph_zetrace.cpp:49

Output with SYCL_PI_TRACE=-1: sycl_pi_trace.log

jfedorov commented 1 year ago

@al42and thank you for reporting! we will look into it in few days. Any chances to check it with recent oneAPI release?

al42and commented 1 year ago

@jfedorov: I was using open-source IntelLLVM above. But the problem can be reproduced with oneAPI 2024.0 (Intel(R) oneAPI DPC++/C++ Compiler 2024.0.0 (2024.0.0.20231017)).

Note: when compiling with icpx, one would have to change line 8 to using gsl = syclex::info::graph_support_level;, because the SYCL Graph API is unstable.

jfedorov commented 1 year ago

@al42and Thank you. will look into it.

jfedorov commented 11 months ago

@al42and This issue is indeed reproduced (~three weeks ago) and hopefully will be fixed soon. thank you.

al42and commented 5 months ago

As of ze_tracer 1b7929b8139b09b03127c92211a9be2be9fb900e, the issue still reproduces with ICPX 2024.1.2, but not with the open-source IntelLLVM 2838f40382bedddbda0a5f20ebeeba86310044da. So, looks like some nasty behavior in the IntelLLVM/UR?

al42and commented 4 months ago

Works fine with ICPX 2024.2.0 and latest open-source IntelLLVM build. :+1: