riscv-non-isa / rvv-intrinsic-doc

https://jira.riscv.org/browse/RVG-153
BSD 3-Clause "New" or "Revised" License
281 stars 89 forks source link

Add `vcreate` for non-tuple types and `vundefine` for tuple types #288

Closed eopXD closed 10 months ago

eopXD commented 10 months ago

Although this use case [0] is SIMD style and not that ideal for RVV, they can benefit from this syntax sugar. On the other hand, vundefined for tuple types will help declaration under -Winitialized to avoid using a vcreate with numbers of vundefined non-tuple members in it.

Resolves #286.

For the vcreate addition, variants of an assembling a fractional LMUL type with smaller fractional LMUL, and assembling LMUL>1 type with fractional LMUL is omitted because they are unlikely use cases.

LLVM implementations of the added intrinsics are:

dzaima commented 10 months ago

Fractional LMUL inputs should be completely forbidden just like they are with register vset_v_ and vget_v_, as they'd need to reference fractional parts of a single register by a register, which is impossible.

zhongjuzhe commented 10 months ago

All fractional intrinsics should be removed.

For example:

vint16m1_t test_vcreate_v_i16mf4_i16m1(vint16mf4_t v0, vint16mf4_t v1, vint16mf4_t v2, vint16mf4_t v3) { return __riscv_vcreate_v_i16mf4_i16m1(v0, v1, v2, v3); }

They are not like tuple type vint16mf4x4_t.

vint16m1_t occupies 1 register wheras vint16mf4x4_t occupies 4 registers.

kito-cheng commented 10 months ago

I could imagine the fraction LMUL version is implement-able but complicate and very low performance (vmerge and/or vslide), so I am +1 on removing that.