Closed chihinko closed 2 years ago
Trace back to content of mem(1e578) should be 0x411, but was
over-written to 0x0002000200020002
: mem 0 0x000000000001e578
0x0002000200020002
by this insn:
101bc: 0206dc27 vse16.v v24,(a3)
Looked into code generated for C statement: where type of a, b, c arrays are short
for (short i = 0; i < 100; i++) {
a[i] = c[i] / b[i];
}
1019a: 0507f657 vsetvli a2,a5,e32,m1,ta,mu **# a5 is the loop cnt, start from 100
# a2 = 32 for first 3 runs
# a2 = 4 for 4th run**
1019e: 0209dc87 vle16.v v25,(s3) # load a2 elems from (s3) c ary 101a2: 020a5d07 vle16.v v26,(s4) # load a2 elems from (s4) b ary 101a6: 4b93ac57 vsext.vf2 v24,v25 # convert from short to int 101aa: 8f91 sub a5,a5,a2 **# dec a5 by a2 each run
# was moved up from 101c0:
# happened in tmp.vdiv.1024.c.321r.sched2**
101ac: 4ba3acd7 vsext.vf2 v25,v26 # convert from short to int 101b0: 878cac57 vdiv.vv v24,v24,v25 101b4: 04f07057 vsetvli zero,zero,e16,mf2,ta,mu # mess up prev configuration at addr 1019a: 101b8: b3804c57 vncvt.x.x.w v24,v24 # convert from int to short 101bc: 0206dc27 vse16.v v24,(a3) **# store 32 elems to (a3)
# 32 for for 3 runs, 4 for 4th run.**
101c0: 040a0a13 addi s4,s4,64 # inc ary b by 32 elems 101c4: 04068693 addi a3,a3,64 # inc ary a by 32 elems 101c8: 04098993 addi s3,s3,64 # inc ary c by 32 elems 101cc: f7f9 bnez a5,1019a <main+0x94>
Due to compile option -mriscv-vector-bits=1024, this is a loop that will loop 3 times (32 elems for each loop), remainder = 4, for 4th run, we only need to do the remaining which is 4 elems. this depend on insn 1019a: 0507f657 vsetvli a2,a5,e32,m1,ta,mu
for 4th run, value of a2 and a5 would be 4 (remainder), but insn 101b4: 04f07057 vsetvli zero,zero,e16,mf2,ta,mu
would disturbe the configuration, resulting insn 101bc: 0206dc27 vse16.v v24,(a3)
would still store 32 elems (should be 4 elems) into mem(a3).
The correct code sequence should be:
1019a: 0507f657 vsetvli a2,a5,e32,m1,ta,mu 1019e: 0209dc87 vle16.v v25,(s3) 101a2: 020a5d07 vle16.v v26,(s4) 101a6: 4b93ac57 vsext.vf2 v24,v25 101aa: 4ba3acd7 vsext.vf2 v25,v26 101ae: 878cac57 vdiv.vv v24,v24,v25 101b2: 04f07057 vsetvli zero,zero,e16,mf2,ta,mu 101b6: b3804c57 vncvt.x.x.w v24,v24 101ba: 0507f657 vsetvli a2,a5,e32,m1,ta,mu # add this to restore configuration 101be: 0206dc27 vse16.v v24,(a3) 101c2: 8f91 sub a5,a5,a2 **# move down here after storing is completed
101c4: 040a0a13 addi s4,s4,64 101c8: 04068693 addi a3,a3,64 101cc: 04098993 addi s3,s3,64 101d0: f7e9 bnez a5,1019a <main+0x94>
Address comment. Thank you Give me some time to figure out it.
Hi, I have tried and tested your code. I didn't has User load segfault @ 0x000200020003f57a error in the latest spike. Your testcase is passed. Also, vsetvli zero,zero,e16,mf2,ta,mu instruction does not change the vl, it only changes the vtype. It should only store 4 elments in your case. No need to restore configuration.
You didnt get segfault does not mean there is no problem, it just meant the memory got trashed was not used. You need to look into problem in spike debug mode:
This is what I saw in debugger (4th run), before execute 101bc: 0206dc27 vse16.v v24,(a3)
: reg 0 a3
0x000000000001f540
: mem 0 0x000000000001f540
0x0000000000000000
: mem 0 0x000000000001f548
0x0000000000000a90
: mem 0 0x000000000001f550
0x0000000000000000
: mem 0 0x000000000001f558
0x0000000000000000
: vreg 0 v24
VLEN=1024 bits; ELEN=32 bits
v24 : [31]: 0x00000002 [30]: 0x00000002 [29]: 0x00000002 [28]: 0x00000002 [27]: 0x00000002 [26]: 0x00000002 [25]: 0x00000002 [24]: 0x00000002 [23]: 0x00000002 [22]: 0x00000002 [21]: 0x00000002 [20]: 0x00000002 [19]: 0x00000002 [18]: 0x00000002 [17]: 0x00000002 [16]: 0x00000002 [15]: 0x00020002 [14]: 0x00020002 [13]: 0x00020002 [12]: 0x00020002 [11]: 0x00020002 [10]: 0x00020002 [9]: 0x00020002 [8]: 0x00020002 [7]: 0x00020002 [6]: 0x00020002 [5]: 0x00020002 [4]: 0x00020002 [3]: 0x00020002 [2]: 0x00020002 [1]: 0x00020002 [0]: 0x00020002
:
: mem 0 0x000000000001f540 0x0002000200020002 : mem 0 0x000000000001f548 0x0002000200020002 <=== these mem should not change : mem 0 0x000000000001f550 0x0002000200020002 <=== these mem should not change : mem 0 0x000000000001f558 0x0002000200020002 <=== these mem should not change
intead of storing 4 elems , it store 0x20 elems, that means mem from 1f548 to 1f560 are trashed
This is what I saw after I manually correct the assembly code:
: until pc 0 101be
: reg 0 a3
0x000000000001e540
: mem 0 0x000000000001e540
0x0000000000000000
: mem 0 0x000000000001e548
0x0000000000000a90
: mem 0 0x000000000001e550
0x0000000000000000
: mem 0 0x000000000001e558
0x0000000000000000
: vreg 0 v24
VLEN=1024 bits; ELEN=32 bits
v24 : [31]: 0x00000002 [30]: 0x00000002 [29]: 0x00000002 [28]: 0x00000002 [27]: 0x00000002 [26]: 0x00000002 [25]: 0x00000002 [24]: 0x00000002 [23]: 0x00000002 [22]: 0x00000002 [21]: 0x00000002 [20]: 0x00000002 [19]: 0x00000002 [18]: 0x00000002 [17]: 0x00000002 [16]: 0x00000002 [15]: 0x00020002 [14]: 0x00020002 [13]: 0x00020002 [12]: 0x00020002 [11]: 0x00020002 [10]: 0x00020002 [9]: 0x00020002 [8]: 0x00020002 [7]: 0x00020002 [6]: 0x00020002 [5]: 0x00020002 [4]: 0x00020002 [3]: 0x00020002 [2]: 0x00020002 [1]: 0x00020002 [0]: 0x00020002
:
: mem 0 0x000000000001e540 0x0002000200020002 <=== only 4 elems got stored : mem 0 0x000000000001e548 0x0000000000000a90 <=== these mem are not changed : mem 0 0x000000000001e550 0x0000000000000000 <=== these mem are not changed : mem 0 0x000000000001e558 0x0000000000000000 <=== these mem are not changed
Actally, -mriscv-vector-bits=1024 is not the only one, all other size are wrong too, they're just not show like your test case. The key is type conversion, you use a new config (vsetvli) for type conversion, but did not restore the original one for storing insn. "vse16.v v24,(a3)"
If you don't know how to use debug mode of spike let me know, I'll give you a full sequence of command.
Another question: where did you get spike source? the one I got seems buggy (vector support), I need to fix several bugs before I can use it.
You said, vsetvli zero,zero,e16,mf2,ta,mu instruction does not change the vl but that's not the behavior of my spike. Maybe this is a problem of spike.
@chihinko I didn't get your problem using the updated spike(ref: https://github.com/riscv-software-src/riscv-isa-sim). I think your spike has a bug that causes this unexpected behavior. You can use the newest spike and try again.
# spike -d --varch=vlen:1024,elen:32 --isa=RV64IMAFDCV pk a.out
: until pc 0 101b8
bbl loader
b = 0 c = 0
b = 1 c = 2
b = 2 c = 4
b = 3 c = 6
b = 4 c = 8
b = 5 c = 10
b = 6 c = 12
b = 7 c = 14
b = 8 c = 16
b = 9 c = 18
:
core 0: 0x00000000000101b8 (0xb3804c57) vnsrl.wx v24, v24, zero
: reg 0 a2
0x0000000000000004 # the 4th run
: reg 0 a3
0x0000000000020640
: mem 0 0x0000000000020640 # mem before store
0x0000000000000000
: mem 0 0x0000000000020648 # mem before store
0x0000000000000990
core 0: 0x00000000000101bc (0x0206dc27) vse16.v v24, (a3)
: mem 0 0x0000000000020640 # mem after store
0x0002000200020002
: mem 0 0x0000000000020648 # mem after store, keep the original value
0x0000000000000990
@chihinko - would you be able to try the spike that @lhtin mentioned above and clarify if the problem persists or not for you? Thanks.
I did try the spike @lhtin pointed me, it did work for this test case, but fail with other problem, e.g.
z 0000000000000000 ra 000000000001023c sp 0000003ffffffb10 gp 000000000001f378 tp 0000000000000000 t0 000000000001031c t1 000000000000000f t2 0000000000000000 s0 000000000001f510 s1 000000000001f1f0 a0 0000000000000001 a1 0000003ffffffb58 a2 0000000000000002 a3 0000000000000000 a4 000000000001f510 a5 000000000001f1f0 a6 000000000000001f a7 0000000000000000 s2 0000000000000000 s3 000000000001f1f0 s4 0000000000000000 s5 0000000000000000 s6 0000000000000000 s7 0000000000000000 s8 0000000000000000 s9 0000000000000000 sA 0000000000000000 sB 0000000000000000 t3 0000000000000000 t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 pc 0000000000010130 va/inst 000000005208acd7 sr 8000000200006620 An illegal instruction was executed!
vobjdump tmp.vmul.128.x >& tmp.vmul.128.x.dump grep 10130 tmp.vmul.128.x.dump 10130: 5208acd7 vid.v v25
Here is the test case: ln -s vmul.c tmp.c riscv64-unknown-elf-gcc -O2 -mriscv-vector-bits=128 tmp.c -o -o tmp.vmul.128.x spike --varch=vlen:128,elen:32 --isa=RV64IMAFDCV pk tmp.vmul.128.x vmul.c.gz
I'm debugging this new spike, please verify if you have same problem.
BTW, here is my spike download:
git remote -v origin https://github.com/riscv-software-src/riscv-isa-sim (fetch) origin https://github.com/riscv-software-src/riscv-isa-sim (push) git branch
- master
Is this version right ?
I tried it and I didn't has the issue like yours: bbl loader b = 0 c = 0 b = 1 c = 2 b = 2 c = 4 b = 3 c = 6 b = 4 c = 8 b = 5 c = 10 b = 6 c = 12 b = 7 c = 14 b = 8 c = 16 b = 9 c = 18 a = 18 b = 3 c = 6 a = 11858 b = 77 c = 154 a = 19602 b = 99 c = 198
This is my spike result.
Oh, I knew your issue now. Your spike configuration is wrong. You are using spike --varch=vlen:128,elen:32 --isa=RV64IMAFDCV pk tmp.vmul.128.x You should use elen:64 instead: spike --varch=vlen:128,elen:64 --isa=RV64IMAFDCV pk tmp.vmul.128.x
Yes, this works. But how do I know when to use elen:64/32 ?
But more tests failed: riscv64-unknown-elf-gcc -O2 -mriscv-vector-bits=64 tmp.c -o tmp.auto_rvv_example1_double_char.64.x spike --varch=vlen:64,elen:64 --isa=RV64IMAFDCV pk tmp.auto_rvv_example1_double_char.64.x b = 0.000000 c = 0.000000 b = 1.37?Oo0 c = 1.37?Oo0 b = 1.37?Oo0 c = 1.37?Oo0 b = 1.37?Oo0 c = 1.37?Oo0 b = 1.37?Oo0 c = 1.37?Oo0 b = 1.37?Oo0 c = 1.37?Oo0 b = 1.37?Oo0 c = 1.37?Oo0 b = 1.37?Oo0 c = 1.37?Oo0 b = 1.37?Oo0 c = 1.37?Oo0 b = 1.37?Oo0 c = 1.37?Oo0 a = 1.37?Oo0 b = 1.37?Oo0 c = 1.37?Oo0 a = 1.37?Oo0 b = 1.37?Oo0 c = 1.37?Oo0 a = 1.37?Oo0 b = 1.37?Oo0 c = 1.37?Oo0 should be b = 0.000000 c = 0.000000 b = 1.000000 c = 2.000000 b = 2.000000 c = 4.000000 b = 3.000000 c = 6.000000 b = 4.000000 c = 8.000000 b = 5.000000 c = 10.000000 b = 6.000000 c = 12.000000 b = 7.000000 c = 14.000000 b = 8.000000 c = 16.000000 b = 9.000000 c = 18.000000 a = 9.000000 b = 3.000000 c = 6.000000 a = 231.000000 b = 77.000000 c = 154.000000 a = 297.000000 b = 99.000000 c = 198.000000
rvv-next doesn't support VLEN = 64. Only support VLEN >= 128 (It's well tested when VLEN >= 128) because the framework in GCC12 has a bug. I fix it in the latest GCC13 upstream and I will not fix it in rvv-next. Because it should change rvv implementation so much and I would rather reimplement it (which can support VLEN=32 and VLEN = 64) in the upstream GCC directly and I am working on it.
Ok, 'll ignore VLEN = 64 failures for the time being. All my problems about compiler and spike are all cleared now, I'll close this issue as not a bug. Thanks for your quick response !
riscv64-unknown-elf-gcc -O2 -mriscv-vector-bits=1024 tmp.c -o tmp.1024.x spike --varch=vlen:1024,elen:32,slen:1024 --isa=RV64IMAFDCV pk tmp.1024.x b = 0 c = 0 b = 1 c = 2 b = 2 c = 4 b = 3 c = 6 b = 4 c = 8 b = 5 c = 10 b = 6 c = 12 b = 7 c = 14 b = 8 c = 16 b = 9 c = 18 a = 2 b = 3 c = 6 a = 2 b = 77 c = 154 a = 2 b = 99 c = 198 z 0000000000000000 ra 000000000001276c sp 0000003ffffffab0 gp 000000000001ed70 tp 0000000000000000 t0 0000000000000026 t1 ffffffffffffffff t2 0000000000000016 s0 000000000001f580 s1 000000000001e020 a0 0002000200020002 a1 000000000001e910 a2 000200020003f572 a3 0000000000000000 a4 000000000001f570 a5 0002000200020002 a6 0000000000000003 a7 0000000000000039 s2 0000000000000000 s3 000000000001e020 s4 000000000001e540 s5 ffffffffffffffff s6 0000000000019742 s7 00000000000000b0 s8 0000000000000001 s9 0000000000000000 sA 0000000000000000 sB 0000000000000000 t3 0000000000000064 t4 00000000000003e8 t5 0000000000000001 t6 0000000000002190 pc 0000000000012784 va/inst 000200020003f57a sr 8000000200006620 User load segfault @ 0x000200020003f57a
tmp.c.gz