Open sffc opened 2 years ago
I think the contains overloads work well. Consider changing get_u32 to get_for_u32 or get_from_u32. If a class/trait only ever deals with u32 and not char, then get(u32) should be fine.
I think get_from_u32
might be good yeah
Though I'm skeptical we should have these in the first place, I guess. it's easy enough to as u32
the char.
Concretely, the classes and functions in question are
CodePointTrie::get
is the only non-suffixed function to take a u32
argument. In CodePointTrie, get_u32
returns a u32
. In all other places, we are consistent in taking a char
.
I'm not sure what my preference is. I'm okay leaving things the way they are, and considering CodePointTrie a special case since it is a low-level collection type. If we start renaming things, what about:
get32
(more concise and doesn't as strongly suggest that we are getting a u32)geti
("get by integer")getu
("get by unsigned integer")Note: The data structures are designed to map from code points to values. In Rust, supporting all code points requires u32 because char forbids surrogate code points.
Therefore, one could argue that the primary input should be a u32. Lookup via char would use a cast, or an "override".
get_u32
taking a u32
instead of returning a u32
seems misleading. getu
would be better.
Discussion:
u32_get()
get
get_with_u32
, get_from_u32
get32
get32
because it doesn't tell my brain that I am getting a u32 return valueProposal:
get(char)
get32(u32)
contains(char)
contains32(u32)
next(char)
next16(u16)
// code unit (Char16Trie only)next32(u32)
OK: @sffc @Manishearth @robertbastian @nordzilla
Still needs docs work
Given that we have decided to use try_from_utf8
for unvalidated string constructors, I'd like to reopen this discussion. I think a more consistent naming for the 32 methods would now be contains_utf32
. Is this worth changing?
The problem was with get
according to the discussion above. If you say get_utf32
, the thinking was, then it looks like you are getting a UTF-32 code unit, when in reality you are passing one in as a parameter. (I don't know how I personally feel)
get32_u32
try_from_utf8
is that one works on code points and the other works on stringsnext16
, whose documentation isn't great: https://unicode-org.github.io/icu4x/rustdoc/icu/collections/char16trie/struct.Char16TrieIterator.html#method.next16get32_ule
which returns a reference, which we actually want.get
and contains
are different signatures. contains
always returns a bool.next
and other methods. I would like if we wouldn't need to make this decision on each API. If we get rid of get32_u32
, we can adopt my proposed naming scheme.get32_ule
?get_ule_utf32
No conclusion yet.
utf32 is a string encoding. u32 is one possible type for a code point.
utf32
is more clear, but it's just an improvement, and if we don't have consensus, it's less work.Conclusion:
get32
-type naming in ICU4X 2.0LGTM: @sffc @robertbastian
LGTM
We inconsistently name methods in the various properties and collections classes that deal with char vs u32. Examples:
contains(char)
,contains_32(u32)
,get(char)
, andget_u32(u32)
, but sometimes it isget(u32)
. And theget_u32
name sounds like it is returning au32
, similar toget_ule
, when in fact it is an overload of theget
method.Feedback from @markusicu.
Thoughts?