mattkretz / wg21-papers

my papers to WG21 — the C++ committee
5 stars 7 forks source link

Add concat & split #47

Closed mattkretz closed 7 years ago

mattkretz commented 7 years ago

If we remove the "do-everything-at-once" datapar_cast, we need functions for concatenation and splitting of datapar and mask objects. Some use cases:

common case

using floatv = native_datapar<float>;
using doublev = native_datapar<double>;
constexpr size_t N = floatv::size() / doublev::size();

floatv f(floatv x) {
  using doublefixed = fixed_size_datapar<double, doublev::size()>;
  array<doublefixed, N> xds = split<N>(static_datapar_cast<double>(x));  // (1.)
  for (doublefixed &tmp : xds) {
    doublev xd = to_native(tmp);  // (2.)
    modify(xd);
    tmp = to_fixed_size(xd);  // (3.)
  }
  return to_native(static_datapar_cast<float>(concat(xds)));  // (4.)
}

Consider line (1.): Should I rather cast & split or split & cast? How does the user know? What should the parameter to split really be? Here I chose the number of resulting datapar objects. The alternatives I can think of:

In the first two choices the return type likely should use fixed_size. (As shown in the code above.) Alternatively abi_for_size. With the last two choices the Abi type is requested by the user and the verbose casting in lines (2.) and (3.) is avoided.

Consider line (4.): Same question, concat & cast or cast & concat? Does concat need any parameter? If split takes a datapar or Abi type then concat should probably do so, too.

special cases

Consider an algorithm that processes 12 data streams in parallel:

using float12 = fixed_size_datapar<float, 12>;
float12 f(const float12 &x) {
#ifdef OPTIMIZE_FOR_AVX
  auto [lo, mid, hi] = split<3>(x);  // (1.)
  auto [r256, r128] =
    handtuned_impl(static_cast<__m256>(to_native(concat(lo, mid))),
                   static_cast<__m128>(to_native(hi)));
  return concat(datapar<float, datapar_abi::avx>(r256),  // (2.)
                datapar<float, datapar_abi::sse>(r128));
#else
  ...
#endif
}

In line (1.) there's no way to directly go from x to a "float8, float4" pair. (True for datapar_cast as well.) Should there be? I.e. should splits to more than one destination type (std::tuple) be supported? Line (2.) shows a concat call with two arguments of different type. I guess we want to require the value_type of all concat arguments to be equal. But do we want to allow different Abi types?

mattkretz commented 7 years ago

@jensmaurer: please take a look. Let me know what you had in mind. For those use cases above, I find the cast code too verbose.

jensmaurer commented 7 years ago

As I said elsewhere, I'm ok with moving forward with datapar_cast for now, since I didn't have time to play around with split and/or concat. Sorry about that.

Seeing the examples above, I still like the explicit names "split" and "concat". For split, we might want to allow a few more conversions in the input. Examples:

auto [lo, hi] = split<2, double>(floatv()); // assumes "double" is twice the size of "float"; returns array // or maybe: auto [low, hi] = split<2, doublev>(floatv()); // allows choice of target ABI right there auto [r256, r128] = split<avxfloatv, ssefloatv>(float12()); return concat(r256, r128);

mattkretz commented 7 years ago

Hmm. After all our discussion I'd leave the need for a do-it-all-cast to TS feedback. Because it's a pain to implement and maintain. And if we don't know whether we want to have it at all, I think we better leave it out of the TS. With static_datapar_cast and split & concat + the to_fixed_size, to_native, to_compatible casts, you can probably solve all problems. The question comes down to how nasty the user code becomes and how bad code-gen turns out.

On concat, I think we agree that the signature in the paper is safe and flexible: image

On split I can also see the need for more overloads. With what I have in D0214R4 your example becomes:

auto [lo, hi] = split<doublev>(static_datapar_cast<double>(floatv()));  // assumes "double" is twice the size of "float"; returns array
auto [r256, r128] = split<8, 4>(float12());
return concat(r256, r128);

Edit: added static_datapar_cast<double> in the example

jensmaurer commented 7 years ago

Ok, works for me.