Closed tgross35 closed 12 months ago
Not yet approved of course, but this is small enough that I got started on work.
Tracking issue: https://github.com/rust-lang/rust/issues/114441
Implementation PR: https://github.com/rust-lang/rust/pull/114443
That same performance regression will also inevitably happen to CStr::to_bytes
(if the implementation switch is actually done), and it already has a note in the docs. Why not do the exact same thing to CStr::len
(which is equivalent to c_str.to_bytes().len()
as you pointed out) and use a performant impl for now?
I agree, we should not hamstring this on purpose. A similar doc note would be appropriate, so we don't make promises we might not want to keep, but we would add this to the performance considerations of making that switch.
That is all fair. I have updated the proposal to not call strlen
at this time, will do my implementation PR soon (changed)
We discussed this in last week's libs-api meeting. Since the plan is still for &CStr
to become a thin pointer, such that determining the length will be a O(N) operation at some point in the future, we don't think .len()
is be an appropriate name for this method, as it hides the fact that some actual 'work' is done by this function.
The .to_bytes()
method is called to_bytes
and not as_bytes
because it's not a 'free' conversion. Similarly, a length method should make clear it is not a O(1) operation, perhaps by adding a verb into the method name. (For example, count_bytes
might be a good method name.)
For example,
count_bytes
might be a good method name.
in that case there should be a doc alias for len
and strlen
Thanks @m-ou-se. I updated this proposal and the implementation to use count_bytes
instead, with the doc alias that @programmerjake suggested.
I am wondering if there may be a better name yet since Nevermind, I forgot we have two different to_bytes functionscount_bytes()
will not return the same value as to_bytes().len()
- but I don't know what would be better and don't feel super strongly about it. Maybe calc_len()
?
I am wondering if there may be a better name yet since
count_bytes()
will not return the same value asto_bytes().len()
isn't count_bytes
supposed to return the same value as strlen
?!
v.to_bytes().len() == strlen(v.as_ptr())
perhaps you were thinking of to_bytes_with_nul()
?
perhaps you were thinking of
to_bytes_with_nul()
?
Yes I am, please ignore me 🙂
I would be happy to accept a PR for unstable count_bytes
. Thank you!
Proposal
Problem statement
Currently, the best way to evaluate the length of a
CStr
iscs.to_bytes().len()
. This is not ergonomic, and may not be the best option whenCStr
becomes a thin pointer.Motivating examples or use cases
For both a fat and thin
CStr
, this will provide a more intuitive method to get the length. WhenCStr
becomes thin,CStr::count_bytes
will be able to callstrlen
directly without going through the less directto_bytes
>to_bytes_with_nul
>slice::from_raw_parts
>len
.Solution sketch
Add a method
CStr::count_bytes
that returns its length. Currently, we can use aself.inner.len() - 1
to get the length of the string, which is a constant time operation. OnceCStr
becomes thin, this will become anO(1)
call tostrlen
- we just need to make it clear in documentation that this will have a performance regression at some point (as we do for theto_bytes()
methods).The API is simple:
Once thin, this will become:
Alternatives
The status quo: accessing length via
.to_bytes().len()
. This ACP intends to improve upon this option.const
nessCurrenly
CStr::from_ptr
uses a version ofstrlen
that calls libc's implementation normally, or a naive Rust implementation when const. This works viaconst_eval_select
which to my knowledge hasn't been okayed for internal use. So, we will not be able to makeCStr::count_bytes
const stable until either that happens, or we decide our naivestrlen
is always OK (makingCStr::new
const is also blocked by this, or by the switch to a thin pointer).My current implementation gates the constness under
const_cstr_from_ptr
which has the same const-stable blocker.cc @oli-obk because of your comment on
const_eval_select
here https://github.com/rust-lang/rust/pull/107624#discussion_r1101911091 (I believe this is the RFC: https://github.com/rust-lang/rfcs/pull/3352)Naming
The name
count_bytes
is up for some bikeshedding.Links and related work
CStr::len
method when stabilizingCStr::is_empty
: https://github.com/rust-lang/rust/issues/102444#issuecomment-1531498196CStr
methods based on whether or not we haveconst_eval_select
, here: https://github.com/rust-lang/rust/pull/107624#issuecomment-1586686716What happens now?
This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.