[Question] Possibility of some math API like i/l/llrint Vectorization

riscvarchive / riscv-v-spec

Working draft of the proposed RISC-V V vector extension

https://jira.riscv.org/browse/RVG-122

Creative Commons Attribution 4.0 International

973 stars 272 forks source link

[Question] Possibility of some math API like i/l/llrint Vectorization #922

Closed Incarnation-p-lee closed 1 year ago

Incarnation-p-lee commented 1 year ago

Consider we have code as below, invoking __builtin_lrint (aka lrint in math.h)

void
test_lrint_scalar (long *out, double *in)
{
  *out = __builtin_lrint (*in);
}

void
test_lrint_vec (long *out, double *in, int count)
{
  for (unsigned i = 0; i < count; i++)
    out[i] = __builtin_lrint (in[i]);
}

For the test_lrint_scalar, we may generate asm with -ffast-math as below.

fld     fa5,0(a1)
fcvt.l.d a5,fa5,dyn
sd      a5,0(a0)

But for test_lrint_vec, I suppose it is possible to leverage vfcvt.x.f.v v,v for vectorization. AFAIK, the semantics of cvt from FP to INT should be almost the same between scalar and vec. Then we may have here.

vle64.v v1,0(a1)
vfcvt.x.f.v     v1,v1
vse64.v v1,0(a0)

nick-knight commented 1 year ago

Yes, you can use the vfcvt family to vectorize the lrint family. You may need the widening and narrowing flavors. And keep in mind that long is 32 bits in the ILP32 ABI.

The rint family is trickier. (A vector analogue of Zfa might help.)

Incarnation-p-lee commented 1 year ago

Yes, you can use the vfcvt family to vectorize the lrint family. You may need the widening and narrowing flavors. And keep in mind that long is 32 bits in the ILP32 ABI.

The rint family is trickier. (A vector analogue of Zfa might help.)

Thanks @nick-knight for the confirmation. Yes, return long has different sizes for ilp32 and lp64, while return int and return long long don't have a similar issue here.

According to widening and narrowing, for example, lrintf16, aka F16 to INT64. I suppose there will be at least 2 options here. Do you have any suggestions here? Thanks again for help.

option 1:
FP16 => FP32
FP32 => INT64

option 2:
FP16 => INT32
INT32 => INT64

nick-knight commented 1 year ago

What behavior do you want for exceptional inputs (Infs and Nans)? My understanding is it's implementation defined, for the lrint family.

Incarnation-p-lee commented 1 year ago

What behavior do you want for exceptional inputs (Infs and Nans)? My understanding is it's implementation defined, for the lrint family.

Yes, it is. The manual of lrint family indicates the return value of exceptional inputs (INF, NAN, or too large) is unspecified. Then it looks like both options are correct.

Incarnation-p-lee commented 1 year ago

Thanks, nick, closed this issue as no more questions now.