riscv-non-isa / rvv-intrinsic-doc

https://jira.riscv.org/browse/RVG-153
BSD 3-Clause "New" or "Revised" License
288 stars 89 forks source link

Question of using indexed load intrinsics #379

Closed Yibo-He closed 1 hour ago

Yibo-He commented 15 hours ago

I try to use vluxei and vloxei intrinsics and write a code snippet. However, the results are confusing. Is this a compiler bug or my misunderstanding?

Code:

#include <riscv_vector.h>
int printf(const char *, ...);
#define dataLen 10

uint64_t idx[dataLen];
int32_t a[dataLen];
int32_t b[dataLen];

int main(){
  uint64_t tmp_idx[dataLen] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, };
  int32_t tmp_a[dataLen] = {137, 151, 121, 19, 11, 17, 222, 31, 4, 57, };
  for (int i = 0; i < dataLen; ++i) { idx[i] = tmp_idx[i]; }
  for (int i = 0; i < dataLen; ++i) { a[i] = tmp_a[i]; }
  for (int i = 0; i < dataLen; ++i) { b[i] = 0; }

  int placeholder0 = dataLen;
  uint64_t* ptr_idx = idx;
  int32_t* ptr_a = a;
  int32_t* ptr_b = b;
  for (size_t vl; placeholder0 > 0; placeholder0 -= vl){
    vl = __riscv_vsetvl_e32m2(placeholder0);
    vint32m2_t va = __riscv_vluxei64_v_i32m2(ptr_a, __riscv_vle64_v_u64m4(ptr_idx, vl), vl);
    __riscv_vse32_v_i32m2(ptr_b, va, vl);
    ptr_idx += vl;
    ptr_a += vl;
    ptr_b += vl;
  }
  for(int i=0; i<dataLen; ++i) { printf("%d ", b[i]); } printf("\n");
  return 0;
}

Just load and store the data. Here are results:

$ riscv64-unknown-elf-gcc -march=rv64gcv_zvfh -mabi=lp64d -Wno-psabi -static -O0 1.c -o a.out && qemu-riscv64 a.out
$137 -1761607680 9895936 38656 151 2030043136 7929856 30976 137 0 
$ clang -march=rv64gcv_zvfh -mabi=lp64d -Wno-psabi -static -O0 1.c -o a.out && qemu-riscv64 a.out
$137 -1761607680 9895936 38656 151 2030043136 7929856 30976 137 0 

However, I think the results should be the data in tmp_a[dataLen]. Is this a compiler bug or my misunderstanding?

Version:

$ riscv64-unknown-elf-gcc --version
riscv64-unknown-elf-gcc () 14.2.0
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ clang --version
clang version 19.1.0
Target: riscv64-unknown-linux-gnu
Thread model: posix
InstalledDir: ...
$ qemu-riscv64 --version
qemu-riscv64 version 9.1.0
Copyright (c) 2003-2024 Fabrice Bellard and the QEMU Project developers
dzaima commented 14 hours ago

The indices taken by indexed load/store are in byte granularity, not element granularity; that is, you want to shift the index argument left by 2 (i.e. __riscv_vsll_vx_u64m4(__riscv_vle64_v_u64m4(ptr_idx, vl), 2, vl)).

Yibo-He commented 1 hour ago

The indices taken by indexed load/store are in byte granularity, not element granularity; that is, you want to shift the index argument left by 2 (i.e. __riscv_vsll_vx_u64m4(__riscv_vle64_v_u64m4(ptr_idx, vl), 2, vl)).

Oh, i see. Thank you very much!