Closed ksco closed 2 months ago
yeah sorry, it fails with vlen=256 on the newly added packuswb
, I've only handled and tested vlen=128, need to think about another way... :(
Well, you can keep the method for when vlen>=256 anyway, and put a method when vlen=128 by spiliting the operation in 2 64bits expensions...
fixed vlen>=256 cases above, 3 more instructions are needed. the packuswb
opcode is tricky because we use the narrowing instruction vnclipu.wv
, which doubles the LMUL on source registers (i.e. 2 registers form a group). but if vlen>=256, there is no need for a register group....
Strange that vlen>=256 needs more opcodes then shorter vlen... But ok.
All red on the CI! The sse tests all failed.