Closed feilongjiang closed 3 years ago
Hi,
I have some minor questions about this finding. I have written some UTF16 tests and found performance improvement already with this patch (your AvoidUnalignedAccesses
). Yet I want to further explore the penalty of unaligned accesses on our board. So
Thanks, Xiaolin
Hi Xianlin, thanks for your review.
1. would you mind simply providing your JMH tests (latin1Len0064Char ones)?
The test is available at: benchmark.
2. just want to know why making this change in [the two lines](https://github.com/riscv/riscv-openjdk/pull/14/files#diff-33e9c9092974f500429bd26434bf3826a4aee7eae96c813b8226be88cbc42589R2566-R2567)
In RV64 port, we use t0/t1
as temp register, for example, the andi
with large immediate will use t0
as temp register:
void andi(Register Rd, Register Rn, int64_t increment, Register temp = t0);
In this case, the andi
in ctzc_bit
may use t0
as temp register, so we select x28/x29
to avoid accidental overwriting of t0
.
Hi Xianlin, thanks for your review.
1. would you mind simply providing your JMH tests (latin1Len0064Char ones)?
The test is available at: benchmark.
2. just want to know why making this change in [the two lines](https://github.com/riscv/riscv-openjdk/pull/14/files#diff-33e9c9092974f500429bd26434bf3826a4aee7eae96c813b8226be88cbc42589R2566-R2567)
In RV64 port, we use
t0/t1
as temp register, for example, theandi
with large immediate will uset0
as temp register:void andi(Register Rd, Register Rn, int64_t increment, Register temp = t0);
In this case, the
andi
inctzc_bit
may uset0
as temp register, so we selectx28/x29
to avoid accidental overwriting oft0
.
Thanks, I really feel appreciation for your provision of the benchmarks and totally understand the change.
After turning to use makeRndString(true, 64)
in the benchmarks and making a simple test on the C910 board, here are the results.
Benchmark Mode Cnt Score Error Units
IndexOfCharWithIndexUnaligned.latin1Len0064Char thrpt 2 7.737 ops/ms
IndexOfCharWithIndexUnaligned.latin1Len0064CharWithIndex1 thrpt 2 7.929 ops/ms
IndexOfCharWithIndexUnaligned.latin1Len0064CharWithIndex2 thrpt 2 8.122 ops/ms
IndexOfCharWithIndexUnaligned.latin1Len0064CharWithIndex3 thrpt 2 8.068 ops/ms
IndexOfCharWithIndexUnaligned.latin1Len0064CharWithIndex4 thrpt 2 7.973 ops/ms
IndexOfCharWithIndexUnaligned.latin1Len0064CharWithIndex5 thrpt 2 7.714 ops/ms
IndexOfCharWithIndexUnaligned.latin1Len0064CharWithIndex6 thrpt 2 8.148 ops/ms
IndexOfCharWithIndexUnaligned.latin1Len0064CharWithIndex7 thrpt 2 8.005 ops/ms
Benchmark Mode Cnt Score Error Units
IndexOfBenchmark.latin1Len0064Char thrpt 2 7.403 ops/ms
IndexOfBenchmark.latin1Len0064CharWithIndex1 thrpt 2 6.851 ops/ms
IndexOfBenchmark.latin1Len0064CharWithIndex2 thrpt 2 7.002 ops/ms
IndexOfBenchmark.latin1Len0064CharWithIndex3 thrpt 2 6.973 ops/ms
IndexOfBenchmark.latin1Len0064CharWithIndex4 thrpt 2 7.774 ops/ms
IndexOfBenchmark.latin1Len0064CharWithIndex5 thrpt 2 7.170 ops/ms
IndexOfBenchmark.latin1Len0064CharWithIndex6 thrpt 2 7.303 ops/ms
IndexOfBenchmark.latin1Len0064CharWithIndex7 thrpt 2 7.124 ops/ms
I think the data might be safe to be shared.
It seems that the unaligned penalty is not so serious on this board, which is different from the HiFive board (maybe). Please correct me if I have missed something.
By the way, the ☺
in the benchmarks is excellent.
Thanks, Xiaolin
Hi team, This patch introduces a new string intrinsic string_indexof_char and the new option 'AvoidUnalignedAccesses'. We found that the performance degradation of unaligned accesses on current RISC-V hardware (HiFive Unleashed, for example) was significant compared to aligned accesses.
We have run JMH test on HiFive Unleashed with the length of 64 random generated Latin1 string and the
indexOf()
index various from 0 to 7, the performance improvement shows as below (throughput mode, the higher the better):The only difference between Aligned and Unaligned is whether the "AvoidUnalignedAccesses" option is enabled. jtreg tests are passed.
Thanks, Feilong