oneapi-src / oneDPL

oneAPI DPC++ Library (oneDPL) https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/dpc-library.html
Apache License 2.0
720 stars 114 forks source link

`oneapi::dpl::inclusive_scan` with an initial element requires the input's value_type to be convertible to the output's value_type with the SYCL backend #1686

Open mmichel11 opened 2 months ago

mmichel11 commented 2 months ago

Describe the Bug: In std::inclusive_scan, the value_type of the input iterator does not necessarily need to be convertible to the value_type of the output iterator. Only if no init element is defined, then the result of binary_op must also be convertible to the input iterator's value_type according to https://en.cppreference.com/w/cpp/algorithm/inclusive_scan

if binary_op(first, first) is not convertible to the value type of decltype(first), the program is ill-formed.

However, if an init element is provided the following returns must be convertible to the init type:

binary_op(init, first) binary_op(init, init) binary_op(first, *first)

I have a reproducer with a binary_op that satisfies these conditions, but fails to compile with the oneDPL SYCL backend.

To Reproduce: The following CPP code be used to reproduce this issue (compile error) with any oneDPL version:

// icpx -fsycl -I/path/to/oneDPL/include scan_bug.cpp
// The commented code that runs on the host compiles and runs as expected.
#include <oneapi/dpl/numeric>
#include <oneapi/dpl/execution>

#include <iostream>
#include <vector>
#include <type_traits>

struct my_int
{
  int v;
  my_int operator+(int x) const
  {
    return my_int{x + v};
  }
  explicit my_int(int x)
    : v(x)
  {
  }
  my_int() = default;
};

struct op
{
  template <typename T1, typename T2>
  my_int operator()(T1 fst, T2 snd) const
  {
    return my_int(fst + snd);
  }
};

int main()
{
  std::vector<int> in = {1, 1, 1};
  std::vector<my_int> out(3);

  op my_op;
  my_int init{0};
  auto first = in.begin();
  using init_t = decltype(init);

  // Conditions of my_op that must be satisfied: https://en.cppreference.com/w/cpp/algorithm/inclusive_scan 
  // With init overloads, the result of invoking the binary operation needs only to be convertible to the init type 
  static_assert(std::is_convertible_v<std::invoke_result_t<op, decltype(init), decltype(*first)>, init_t>);
  static_assert(std::is_convertible_v<std::invoke_result_t<op, decltype(init), decltype(init)>, init_t>);
  static_assert(std::is_convertible_v<std::invoke_result_t<op, decltype(*first), decltype(*first)>, init_t>);

#if 1
  sycl::queue q(sycl::default_selector_v);
  std::inclusive_scan(dpl::execution::make_device_policy(q), first, first + 3, out.begin(), my_op, init);
#else
  std::inclusive_scan(first, first + 3, out.begin(), my_op, init);
#endif
  std::cout << out[0].v << " " << out[1].v << " " << out[2].v << std::endl;
}

Expected Behavior: The above code is expected to compile. This can be observed by altering the reproducer to run the serial host version of std::inclusive_scan. The root of the issue is that the same SLM accessor is used internally to load the input and store the result of the binary operation which requires the two types to be convertible. The type of this accessor is the init type (my_int in the example) whereas the input data is a buffer of integers. This results in the following compilation error:

oneDPL/include/oneapi/dpl/pstl/hetero/../hetero/dpcpp/unseq_backend_sycl.h:735:41: error: no viable overloaded '=' 735 | local_acc[__local_id] = data_acc(adjusted_global_id, acc);

Additional Context: I believe this issue exists in the reduce-then-scan prototype targeted for 2022.7.0. We should either resolve this in the upcoming scan implementation or document this issue as a known limitation.

The affected scan pattern is used in many oneDPL algorithms, so the impact of this issue is likely broader than inclusive_scan.

timmiesmith commented 3 weeks ago

@mmichel11 is this issue resolved with the merge of #1680?

mmichel11 commented 3 weeks ago

@timmiesmith This issue is not fixed with the current state of the reduce-then-scan PRs, so I think we will need to documentation note. A fix will also need to be made to the original scan-then-propagate algorithm since it is used as a fallback.