Open bugnano opened 2 months ago
As far as I understand the codebase, I don't think AsRef
is sensible to implement. The problem with AsRef
is that the trait implementation would have to look something like
pub trait AsRef<Utf32Str<'_>> {
// Required method
fn as_ref(&self) -> &Utf32Str<'_> {
&self.slice(..)
}
}
but this does not work since the Utf32Str
is then immediately dropped.
More generally though, it seems to me that you are treating Utf32String
and Utf32Str
as literal string types and (despite the suggestive names) this is not really correct in my opinion. This is described in the documentation (though certainly it should be clarified more), but Utf32String
is internally an iterator over the first char of each grapheme cluster. This means that the conversion String -> Utf32String
is lossy. So it only really makes sense to discuss a Utf32String
as a sort of "match object".
Here is an example straight from UAX #29:
let s = "क्षि";
let utf32_s = Utf32String::from(s);
for g in s.graphemes(true) {
println!("{g} {:?}", g.chars());
}
for ch in utf32_s.slice(..).chars() {
println!("{ch}");
}
In my opinion, the fact that Utf32String
implements Display
is semantically wrong: at least to me, it is better to think of it as an "internal String
-like matching object". Maybe @pascalkuthe has a good reason for this?
Unfortunately, rendering of grapheme clusters is not handled correctly by many terminals anyway, so actually doing things properly in your own code does not guarantee that things will go as expected...
In your code examples, I would just use your own types and your own Display
implementation and never use Utf32String
inside your own code. In other words, having a Vec<Utf32String>
in the first place is (in my opinion) quite weird. I think in most cases the only time when you should ever be touching a Utf32String
is in the mutable closure passed to the Injector::push
method.
I hope this is helpful!
It's somewhat helpful, but I'm not satisfied with it.
The conversion String -> Utf32String
is technically lossy, but for my usecase I don't really care.
What I was asking is for Utf32String -> Utf32Str
to be consistent with the relationship that there exists with String -> str
, OsString -> OsStr
, and so on, which means, for example that if a function expects &str
as its parameter, I can pass &String
, because String
implements AsRef<str>
.
Now, if a function expects Utf32Str
, I expect to be able to pass it a reference to Utf32String
, which is not the case today.
For maximising performance of matching, I have a
Vec<Utf32String>
that I want to match against. It's fairly big (hundreds of thousands of records). For matching I do something like:So that I can sort it like so:
And finally, construct a new
Vec<Utf32String>
with the filtered result:Notice that I do
file_name.clone()
twice.Now, to avoid at least 1 clone, I was thinking of refactoring the scoring vector like so:
So far so good, but then I found no way of converting an Utf32Str to Utf32String for the final
Vec<Utf32String>
It would be nice to have, first of all an
AsRef<Utf32Str>
for Utf32String, so that I don't have to call.slice(..)
, next it would be awesome to have aFrom<Utf32Str>
for Utf32String, so that I can do something likeUtf32String::from(file_name)
in the finalVec<Utf32String>