intel / llvm

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.
Other
1.21k stars 725 forks source link

[SYCL][E2E][CUDA] Intermittent failure of get_last_event.cpp when run on Windows #14324

Open mmoadeli opened 2 months ago

mmoadeli commented 2 months ago

Describe the bug

Below is simplified reproducer of the failure seen in e2e/InOrderEventsExt/get_last_event.cpp when built/ran on Windows/CUDA(GTX 1650 / sm_75). The failure is seen intermittently and not on all GPU devices. For instance the error could not re-produced on a CUDA GPU with sm_90.

#include <sycl/detail/core.hpp>
#include <sycl/detail/host_task_impl.hpp>
#include <sycl/properties/all_properties.hpp>
#include <sycl/usm.hpp>

template <typename F>
int Check(const sycl::queue &Q, const char *CheckName, const F &CheckFunc) {
  sycl::event E = CheckFunc();
  if (E != Q.ext_oneapi_get_last_event()) {
    std::cout << "Failed " << CheckName << std::endl;
    return 1;
  }
  return 0;
}

int main() {
  int Failed = 0;
  try {
    sycl::queue Q{{sycl::property::queue::in_order{}}};

    Failed += Check(Q, "host_task", [&]() {
      return Q.submit([&](sycl::handler &CGH) { CGH.host_task([]() {}); });
    }); 

    auto ExternalEvent = Q.single_task([]() {});

    if (!Q.get_device().has(sycl::aspect::usm_shared_allocations))
      return Failed;
    constexpr size_t N = 64;
    int *Data1 = sycl::malloc_shared<int>(N, Q);

    Q.wait_and_throw();
    sycl::free(Data1, Q);

  } catch (const std::exception& err) {
    std::puts(err.what());
    assert(false && "Wrong ...");
  }
  return Failed;
}

As a note, the test does not fail, if SYCL_PI_TRACE=-1. Moreover, in above code removing int *Data1 = sycl::malloc_shared<int>(N, Q); make the test pass, while the Data1 is used no where.

To reproduce

Build and run e2e/InOrderEventsExt/get_last_event.cpp on Windows with CUDA(GTX 1650). It may also fail on other CUDA GPU devices.

Environment

Additional context

No response

mmoadeli commented 1 month ago

A reproducer issue has been created using CUDA as a bug report