Open lincot opened 1 year ago
Since you already wrote the code, make a PR? Though you might want to look into using vec.spare_capacity_mut()
+ slice::get_unchecked
, which will add some debug asserts compared to the using direct pointer writes.
@the8472 this way? AFAIK get_unchecked_mut
doesn't perform debug assertions. We already have a debug_assert!
for capacity, though. As to the PR, do we want the new push_unchecked
function duplicating char::encode_utf8
code?
I have no opinion on push_unchecked, you can just make it a private function if you don't have a reason to expose it.
@the8472 it seems that the debug assert only applies if you compile core from source.
As for push_unchecked
, I would like to see it in the library as I commonly use it after initializing strings with capacity.
I managed to make it identical to the get_unchecked_mut
approach by reusing char::encode_utf8
with a hint.
If you want to add API you may want to open an ACP first. The internal optimization only needs a PR.
Although the get_unchecked_mut
and char::encode_utf8
push_unchecked
implementations give identical results when not inlined and with String::push
, I have found that they behave very differently in other applications, the latter being slower. Also, the hint is too hacky.
Another solution is to add a new function encode_utf8_raw_unchecked
to be called by push_unchecked
and encode_utf8_raw
. It is identical to the first approach in both applications and is almost the same for char::encode_utf8
.
Another thing to note is that this version, unlike the current one, is unable to elide the capacity check in a simple ASCII push loop with a pre-allocated string. Both versions fail on non-ASCII and on long strings, though.
It seems that the current version is able to elide the check because it uses RawVec::reserve_for_push
(which has #[inline(never)]
) instead of the usual RawVec::reserve
in the case of ASCII input. Interestingly, if we replace the reserve call with a panic, it gets elided even in the case of long non-ASCII strings.
If we put the reserve call behind #[inline(never)]
, it works for a short non-ASCII string, but not for a long one. RawVec::reserve
itself performs a check and calls a cold reallocation function when needed. Locally I observed that if #[inline(never)]
is applied instead of #[cold]
or together with it, it only has a partial effect similar to putting String::reserve
behind #[inline(never)]
.
Despite the issue, my version still outperforms the original in a benchmark with pre-allocated strings.
current push
:
test bench_push_1_byte_prealloc ... bench: 10,170 ns/iter (+/- 85) = 983 MB/s
test bench_push_1_byte_prealloc_blackbox ... bench: 11,129 ns/iter (+/- 46) = 898 MB/s
test bench_push_2_bytes_prealloc ... bench: 4,742 ns/iter (+/- 5) = 4217 MB/s
test bench_push_2_bytes_prealloc_blackbox ... bench: 49,587 ns/iter (+/- 41) = 403 MB/s
test bench_push_3_bytes_prealloc ... bench: 4,744 ns/iter (+/- 5) = 6323 MB/s
test bench_push_3_bytes_prealloc_blackbox ... bench: 49,675 ns/iter (+/- 41) = 603 MB/s
test bench_push_4_bytes_prealloc ... bench: 4,745 ns/iter (+/- 5) = 8429 MB/s
test bench_push_4_bytes_prealloc_blackbox ... bench: 42,509 ns/iter (+/- 29) = 940 MB/s
new push
:
test bench_push_1_byte_prealloc ... bench: 4,753 ns/iter (+/- 51) = 2103 MB/s
test bench_push_1_byte_prealloc_blackbox ... bench: 12,419 ns/iter (+/- 83) = 805 MB/s
test bench_push_2_bytes_prealloc ... bench: 4,750 ns/iter (+/- 27) = 4210 MB/s
test bench_push_2_bytes_prealloc_blackbox ... bench: 11,863 ns/iter (+/- 86) = 1685 MB/s
test bench_push_3_bytes_prealloc ... bench: 4,748 ns/iter (+/- 26) = 6318 MB/s
test bench_push_3_bytes_prealloc_blackbox ... bench: 16,597 ns/iter (+/- 86) = 1807 MB/s
test bench_push_4_bytes_prealloc ... bench: 4,751 ns/iter (+/- 27) = 8419 MB/s
test bench_push_4_bytes_prealloc_blackbox ... bench: 21,330 ns/iter (+/- 79) = 1875 MB/s
(you can see a small regression in bench_push_1_byte_prealloc_blackbox
, it doesn't happen if the code isn't compiled for the native CPU or if std is compiled from source)
however, both versions work much worse than push_unchecked
:
test bench_push_1_byte_prealloc ... bench: 118 ns/iter (+/- 1) = 84745 MB/s
test bench_push_1_byte_prealloc_blackbox ... bench: 10,041 ns/iter (+/- 18) = 995 MB/s
test bench_push_2_bytes_prealloc ... bench: 328 ns/iter (+/- 0) = 60975 MB/s
test bench_push_2_bytes_prealloc_blackbox ... bench: 8,292 ns/iter (+/- 26) = 2411 MB/s
test bench_push_3_bytes_prealloc ... bench: 491 ns/iter (+/- 2) = 61099 MB/s
test bench_push_3_bytes_prealloc_blackbox ... bench: 11,849 ns/iter (+/- 35) = 2531 MB/s
test bench_push_4_bytes_prealloc ... bench: 689 ns/iter (+/- 4) = 58055 MB/s
test bench_push_4_bytes_prealloc_blackbox ... bench: 16,566 ns/iter (+/- 33) = 2414 MB/s
An alternative implementation of
String::push
, which reserves space if needed and then performspush_unchecked
, gives a significant performance improvement.current
push
:new
push
:bench.rs
```rust #![no_std] #![feature(test)] extern crate alloc; extern crate test; use alloc::string::String; use test::{black_box, Bencher}; // #[inline] // fn push(s: &mut String, ch: char) { // s.push(ch); // } #[inline] fn push(s: &mut String, ch: char) { if s.capacity() - s.len() < ch.len_utf8() { s.reserve(ch.len_utf8()); } unsafe { s.push_unchecked(ch) }; } trait PushUncheckedIt's worth noting that the current
String::push
implementation encodes characters into a temporary zero-initialized buffer. The zeroing alone is an unnecessary instruction, but this method is used in several places, notablyString::insert
and the defaultWrite::write_char
.