[String intrinsic] Add string intrinsic string_indexof_char

feilongjiang commented 3 years ago

Hi team, This patch introduces a new string intrinsic string_indexof_char and the new option 'AvoidUnalignedAccesses'. We found that the performance degradation of unaligned accesses on current RISC-V hardware (HiFive Unleashed, for example) was significant compared to aligned accesses.

We have run JMH test on HiFive Unleashed with the length of 64 random generated Latin1 string and the indexOf() index various from 0 to 7, the performance improvement shows as below (throughput mode, the higher the better):

Benchmark	Score(Aligned)	Score(Unaligned)
latin1Len0064Char	2.775	2.769
latin1Len0064CharWithIndex1	2.402	0.153
latin1Len0064CharWithIndex2	2.498	0.15
latin1Len0064CharWithIndex3	2.404	0.154
latin1Len0064CharWithIndex4	2.564	0.151
latin1Len0064CharWithIndex5	2.546	0.154
latin1Len0064CharWithIndex6	2.609	0.156
latin1Len0064CharWithIndex7	2.612	0.156

The only difference between Aligned and Unaligned is whether the "AvoidUnalignedAccesses" option is enabled. jtreg tests are passed.

Thanks, Feilong

zhengxiaolinX commented 3 years ago

Hi,

I have some minor questions about this finding. I have written some UTF16 tests and found performance improvement already with this patch (your AvoidUnalignedAccesses). Yet I want to further explore the penalty of unaligned accesses on our board. So

would you mind simply providing your JMH tests (latin1Len0064Char ones)?
just want to know why making this change in the two lines
+1 -- this patch looks good to me.

Thanks, Xiaolin

feilongjiang commented 3 years ago

Hi Xianlin, thanks for your review.

1. would you mind simply providing your JMH tests (latin1Len0064Char ones)?

The test is available at: benchmark.

2. just want to know why making this change in [the two lines](https://github.com/riscv/riscv-openjdk/pull/14/files#diff-33e9c9092974f500429bd26434bf3826a4aee7eae96c813b8226be88cbc42589R2566-R2567)

In RV64 port, we use t0/t1 as temp register, for example, the andi with large immediate will use t0 as temp register:

void andi(Register Rd, Register Rn, int64_t increment, Register temp = t0);

In this case, the andi in ctzc_bit may use t0 as temp register, so we select x28/x29 to avoid accidental overwriting of t0.

zhengxiaolinX commented 3 years ago

Hi Xianlin, thanks for your review.
1. would you mind simply providing your JMH tests (latin1Len0064Char ones)?
The test is available at: benchmark.
2. just want to know why making this change in [the two lines](https://github.com/riscv/riscv-openjdk/pull/14/files#diff-33e9c9092974f500429bd26434bf3826a4aee7eae96c813b8226be88cbc42589R2566-R2567)
In RV64 port, we use t0/t1 as temp register, for example, the andi with large immediate will use t0 as temp register:
void andi(Register Rd, Register Rn, int64_t increment, Register temp = t0);
In this case, the andi in ctzc_bit may use t0 as temp register, so we select x28/x29 to avoid accidental overwriting of t0.

Thanks, I really feel appreciation for your provision of the benchmarks and totally understand the change. After turning to use makeRndString(true, 64) in the benchmarks and making a simple test on the C910 board, here are the results.

Benchmark                                                   Mode  Cnt  Score   Error   Units
IndexOfCharWithIndexUnaligned.latin1Len0064Char            thrpt    2  7.737          ops/ms
IndexOfCharWithIndexUnaligned.latin1Len0064CharWithIndex1  thrpt    2  7.929          ops/ms
IndexOfCharWithIndexUnaligned.latin1Len0064CharWithIndex2  thrpt    2  8.122          ops/ms
IndexOfCharWithIndexUnaligned.latin1Len0064CharWithIndex3  thrpt    2  8.068          ops/ms
IndexOfCharWithIndexUnaligned.latin1Len0064CharWithIndex4  thrpt    2  7.973          ops/ms
IndexOfCharWithIndexUnaligned.latin1Len0064CharWithIndex5  thrpt    2  7.714          ops/ms
IndexOfCharWithIndexUnaligned.latin1Len0064CharWithIndex6  thrpt    2  8.148          ops/ms
IndexOfCharWithIndexUnaligned.latin1Len0064CharWithIndex7  thrpt    2  8.005          ops/ms

Benchmark                                      Mode  Cnt  Score   Error   Units
IndexOfBenchmark.latin1Len0064Char            thrpt    2  7.403          ops/ms
IndexOfBenchmark.latin1Len0064CharWithIndex1  thrpt    2  6.851          ops/ms
IndexOfBenchmark.latin1Len0064CharWithIndex2  thrpt    2  7.002          ops/ms
IndexOfBenchmark.latin1Len0064CharWithIndex3  thrpt    2  6.973          ops/ms
IndexOfBenchmark.latin1Len0064CharWithIndex4  thrpt    2  7.774          ops/ms
IndexOfBenchmark.latin1Len0064CharWithIndex5  thrpt    2  7.170          ops/ms
IndexOfBenchmark.latin1Len0064CharWithIndex6  thrpt    2  7.303          ops/ms
IndexOfBenchmark.latin1Len0064CharWithIndex7  thrpt    2  7.124          ops/ms

I think the data might be safe to be shared. It seems that the unaligned penalty is not so serious on this board, which is different from the HiFive board (maybe). Please correct me if I have missed something. By the way, the ☺ in the benchmarks is excellent.

Thanks, Xiaolin

riscv-collab / riscv-openjdk

[String intrinsic] Add string intrinsic string_indexof_char #14