Tracking Issue for `ascii::Char` (ACP 179)

scottmcm commented 1 year ago

Feature gate: #![feature(ascii_char)] #![feature(ascii_char_variants)]

This is a tracking issue for the ascii::Char type from https://github.com/rust-lang/libs-team/issues/179

https://doc.rust-lang.org/nightly/std/ascii/enum.Char.html

Public API

// core::ascii

#[repr(u8)]
enum Char {
    Null = 0,
    …
    Tilde = 127,
}

impl Debug for Char { … }
impl Display for Char { … }
impl Default for Char { ... }

impl Step for Char { ... } // so `Range<Char>` is an Iterator

impl Char {
    const fn from_u8(x: u8) -> Option<Self>;
    const unsafe fn from_u8_unchecked(x: u8) -> Self;
    const fn digit(d: u8) -> Option<Self>;
    const unsafe fn digit_unchecked(d: u8) -> Self;
    const fn as_u8(self) -> u8;
    const fn as_char(self) -> char;
    const fn as_str(&self) -> &str;
}

impl [Char] {
    const fn as_str(&self) -> &str;
    const fn as_bytes(&self) -> &[u8];
}

impl From<Char> for u8 { … }
impl From<Char> for char { … }
impl From<&[Char]> for &str { … }

// core::array

impl<const N: usize> [u8; N] {
    const fn as_ascii(&self) -> Option<&[ascii::Char; N]>;
    const unsafe fn as_ascii_unchecked(&self) -> &[ascii::Char; N];
}

// core::char

impl char {
    const fn as_ascii(&self) -> Option<ascii::Char>;
}

// core::num

impl u8 {
    const fn as_ascii(&self) -> Option<ascii::Char>;
}

// core::slice

impl [u8] {
    const fn as_ascii(&self) -> Option<&[ascii::Char]>;
    const unsafe fn as_ascii_unchecked(&self) -> &[ascii::Char];
}

// core::str

impl str {
    const fn as_ascii(&self) -> Option<&[ascii::Char]>;
}

Steps / History

[x] Implementation:
- 111009
- 111222
- 114641
- 121024
[ ] Final comment period (FCP)^1
[ ] Stabilization PR

Unresolved Questions

[ ] What should it be named? Code mixing char and Char might be too confusing.
[ ] How should its Debug impl work?
- https://github.com/rust-lang/rust/pull/115434
[ ] Is allowing as-casting from it a good or bad feature?
- FWIW, there's no char::to_u32, just as u32 for it.
[ ] Some of the as_ascii methods take &self for consistency with is_ascii. Should they take self instead where possible, as the usually-better option, or stick with &self for the consistency?

mina86 commented 8 months ago

Actually, BTreeMap and HashMap are examples where no_std code would benefit from naming you’ve described. It’s not uncommon to see things like

#[cfg(feature = "std")]
use std::collections::HashMap as Map;
#[cfg(not(feature = "std"))]
use alloc::collections::BTreeMap as Map;

Someone who needs a full name always has an option of not useing the symbol but module which contains it. Sure hash::Map or btree::Map is a bit longer to type, but it gives ‘native’ way of either using short or full name as preferred by the user. (As an aside, it also creates a natural place where all API related to B trees and hash tables can live in).

At the end of the day I’m not that invested in the name, but AChar is definitely the worst option.

(By the way, for things like Iter, Rust should have nested types but that’s a completely different can of worms).

sffc commented 8 months ago

I don't see it mentioned yet in this thread so I'll just say that we use AsciiByte in the icu4x code base. It is a range-limited byte (u8), not a range-limited char.

On this line of reasoning, AsciiU8 could be another choice, similar to NonZeroU8.

robertbastian commented 8 months ago

U7

mina86 commented 8 months ago

I agree, that would be ideal.

What we need is support for casting integer literals to custom types. Good example of what I mean is:

// Compiles:
let num: core::num::NonZeroU8 = 42;
// Doesn’t compile:
// let num: core::num::NonZeroU8 = 0;

This then would need to be extended to handle byte literals as well such as the following would work as well:

let num: core::num::NonZeroU8 = b'*';

With all that, it would be trivial to define U7 which would work as expected without a need to introduce any new APIs:

let chr: core::num::U7 = b'*';
let text: &[core::num::U7] = core::slice::from_ref(&chr);
let string: &str = text.into();

The question is: what’s a plausible timeline for such a feature (if it’s even plausible to be implemented) compared to how badly we want an ASCII character type.

zopsicle commented 8 months ago

The name U7 suggests a general purpose 7-bit integer type and would therefore imply a Display impl that formats the number in decimal, as opposed to an ASCII character.

rben01 commented 8 months ago

I agree, that would be ideal.

What we need is support for casting integer literals to custom types. Good example of what I mean is:
// Compiles:
let num: core::num::NonZeroU8 = 42;
// Doesn’t compile:
// let num: core::num::NonZeroU8 = 0;
This then would need to be extended to handle byte literals as well such as the following would work as well:
let num: core::num::NonZeroU8 = b'*';
With all that, it would be trivial to define U7 which would work as expected without a need to introduce any new APIs:
let chr: core::num::U7 = b'*';
let text: &[core::num::U7] = core::slice::from_ref(&chr);
let string: &str = text.into();
The question is: what’s a plausible timeline for such a feature (if it’s even plausible to be implemented) compared to how badly we want an ASCII character type.

Extending the interpretation of {integer} to custom types sounds dangerous, as it doesn't adhere to Rust’s principle of making most everything explicit. For instance, what do I hover over in my IDE to see the method that is used to turn 42 into a NonZeroU8?

If we did want a simpler form of “literal → NonZero*”, I think it would be better handled by more type-suffixes, e.g., 42_nz_u8, 100_nz_isize, etc. and keeping it limited to core types. Byte literals could be handled similarly, as nz_u8_b'*' and u7_b'*'. (Not sure where I'd fall on allowing non-core versions of these, either.)

mina86 commented 8 months ago

as it doesn't adhere to Rust’s principle of making most everything explicit.

Rust is happy to not adhere to that principle (I really question whether it even is a principle). Heck, integer literals are example of implicit type coercion. It’s really not an issue. What I’m proposing is extension of existing behaviour that people are familiar with.

For instance, what do I hover over in my IDE to see the method that is used to turn 42 into a NonZeroU8?

One option is hovering over the literal.

programmerjake commented 8 months ago

The name U7 suggests a general purpose 7-bit integer type and would therefore imply a Display impl that formats the number in decimal, as opposed to an ASCII character.

plus, I'm hoping rust eventually gets generic integer types, so u7 (or at least uint<7>) would be a real 7-bit integer type that behaves like an integer, not a character. We'd still want a character type for ASCII otherwise we could just use the pattern type u8 is ..0x80 once pattern types are added.

rben01 commented 7 months ago

I don't see it mentioned yet in this thread so I'll just say that we use AsciiByte in the icu4x code base. It is a range-limited byte (u8), not a range-limited char.

On this line of reasoning, AsciiU8 could be another choice, similar to NonZeroU8.

+1 for AsciiU8

kupiakos commented 7 months ago

I don't see it mentioned yet in this thread so I'll just say that we use AsciiByte in the icu4x code base. It is a range-limited byte (u8), not a range-limited char.

On this line of reasoning, AsciiU8 could be another choice, similar to NonZeroU8.

While that interpretation makes sense, I'm not a fan. Putting U8 in the name implies that it's a predominantly numeric type. Rather, it is a displayable character limited to the ASCII range; it being u8-sized is solely a consequence of it being limited to ASCII. That's a property that makes sense to put front-and-center for zerocopy de/serialization, but IMO is more confusing than helpful in the stdlib, since the goal is to treat it as a character and not an unsigned number.

We shouldn't impl Add<Rhs = Self> or other arithmetic on Self for this type for the same reasons as char. It follows many of the same restrictions as char. It's displayed like a char. A sequence of these forms a str. Keeping the "char" name makes tons of sense to me.

+1 for AsciiChar (with better default diagnostics than ascii::Char)

AsciiByte isn't too bad, but again, highlighting it's a character seems very valuable to me.

sffc commented 7 months ago

While that interpretation makes sense, I'm not a fan. Putting U8 in the name implies that it's a predominantly numeric type. Rather, it is a displayable character limited to the ASCII range; it being u8-sized is solely a consequence of it being limited to ASCII. That's a property that makes sense to put front-and-center for zerocopy de/serialization, but IMO is more confusing than helpful in the stdlib, since the goal is to treat it as a character and not an unsigned number. ... It follows many of the same restrictions as char. It's displayed like a char. A sequence of these forms a str. Keeping the "char" name makes tons of sense to me.

This line of reasoning makes sense to me, though it's worth pointing out that slices of this new type are valid as both &[u8] and &str.

About 1/4 of the ASCII range is not displayable. A valid interpretation of the type is that it utilizes more human-readable rendering when possible, but it is still just a byte.

One aspect in favor of char is that char is very nichey and so is the Ascii Char.

We shouldn't impl Add<Rhs = Self> or other arithmetic on Self for this type for the same reasons as char.

Doing arithmetic on these is useful (case conversion, etc), but it is easy enough to convert back and forth to u8 for this. I agree it may be more error-prone to allow these by default on this type.

Finomnis commented 7 months ago

A valid interpretation of the type is that it utilizes more human-readable rendering when possible, but it is still just a byte.

I disagree. One basic principle of this type, to my understanding, is that a &[ascii::Char] can be zero-copy cast into a &str. For that to be possible, it cannot be any u8 value, it has to be a valid ASCII value. Otherwise it would collide with UTF-8 values.

mina86 commented 7 months ago

We shouldn't impl Add or other arithmetic on Self for this type for the same reasons as char. It follows many of the same restrictions as char.

I disagree. As I expressed in my comment in PR implementing Add, ASCII characters have no holes in the domain (char forbids surrogates) and all 7-bit numbers are valid ASCII characters (char doesn’t fit in 20 bits but doesn’t cover all 21 bits).

Doing arithmetic on these is useful (case conversion, etc), but it is easy enough to convert back and forth to u8 for this. I agree it may be more error-prone to allow these by default on this type.

It’s easy to convert to u8. It’s hard to convert from u8. This would be true if Rust had core::num::U7.

rben01 commented 7 months ago

Arithmetic should not exist on AsciiChar (or whatever it ends up being called). Any arithmetic common and useful enough to be worth thinking about, like addition with 'a' - 'A' for case conversion, should just be exposed through a method like to_ascii_uppercase (or just to_uppercase since the domain is already limited to ASCII).

I disagree. As I expressed in https://github.com/rust-lang/rust/pull/120219#issuecomment-1908338791, ASCII characters have no holes in the domain (char forbids surrogates) and all 7-bit numbers are valid ASCII characters (char doesn’t fit in 20 bits but doesn’t cover all 21 bits).

There is a difference between char, NonZero and AsciiChar in regards to Add. AsciiChar can be thought of as u7 where addition has an obvious wrapping behaviour of keeping only the seven least significant bits of the result. There is no such clear rule for char or NonZero. char is not quite 21-bit and both have forbidden representations less than their max value.

I certainly would not expect wrapping. If adding 1 to char 127 got me back to '\0' I'd be very very surprised. If anything I'd expect a panic, as with all other default arithmetic operations. (Now, should there be Wrapping<AsciiChar>? I'd still say no, but at least its behavior wouldn't be surprising.)

mina86 commented 7 months ago

I certainly would not expect wrapping. If adding 1 to char 127 got me back to '\0' I'd be very very surprised. If anything I'd expect a panic, as with all other default arithmetic operations.

And you would get behaviour as with any default arithmetic operations. AsciiChar(127u8) + 1 would produce the same effect as AsciiChar(255u8 + 1). Why wrapping in AsciiChar would surprising where wrapping with u8 isn’t?

Finomnis commented 7 months ago

I certainly would not expect wrapping. If adding 1 to char 127 got me back to '\0' I'd be very very surprised. If anything I'd expect a panic, as with all other default arithmetic operations.

And you would get behaviour as with any default arithmetic operations. AsciiChar(127u8) + 1 would produce the same effect as AsciiChar(255u8 + 1). Why wrapping in AsciiChar would surprising where wrapping with u8 isn’t?

u8 does not wrap, it panics on wrap (on debug builds)

mina86 commented 7 months ago

u8 does not wrap, it panics on [overflow] (on debug builds)

Yes, and that’s what Add in my PR does as well.

sffc commented 7 months ago

If AsciiChar(127u8) + 1 panics in debug, like other overflowing integer types do, what does it do on release? It can't just roll over to 128 because that would be UB since 128 is not a valid AsciiChar, and rolling to \0 is not free (still requires branching, unlike unsigned int overflow which is handled in hardware). So it seems cleaner to not impl Add for AsciiChar.

clarfonthey commented 7 months ago

What exactly is the intent behind some kind of Add operation that isn't covered by Step and a to_digit method?

mina86 commented 7 months ago

If AsciiChar(127u8) + 1 panics in debug, like other overflowing integer types do, what does it do on release? It can't just roll over to 128 because that would be UB since 128 is not a valid AsciiChar, and rolling to \0 is not free (still requires branching, unlike unsigned int overflow which is handled in hardware).

It wraps to zero which doesn’t require branching. It’s just an and operation. Do you also argue that Index shouldn’t do bound checking because those introduce branch?

What exactly is the intent behind some kind of Add operation that isn't covered by Step and a to_digit method?

This is disingenuous question. You might just as well ask what is intend behind some kind of AsciiChar which isn’t covered by char::is_ascii, str::is_ascii and str::from_utf8? If you really think that Step, a trait designed for iterators, is appropriate for use with arithmetic than I really don’t know what to tell you.

kupiakos commented 7 months ago

About 1/4 of the ASCII range is not displayable. A valid interpretation of the type is that it utilizes more human-readable rendering when possible, but it is still just a byte.

This doesn't seem particularly relevant - this is just a statistical difference between Unicode and ASCII. U+202E, anyone?

what does it do on release

Overflow from 128 -> 0, with the associated code cost to do that.

still requires branching, unlike unsigned int overflow which is handled in hardware

The overflow is branchless: it's transmute((x as u8 + 128 + y as u8) & 0x7f). It's ~3x as costly as a u8 add, but it's not terrible.

I disagree. As I expressed in my comment in PR implementing Add, ASCII characters have no holes in the domain (char forbids surrogates) and all 7-bit numbers are valid ASCII characters (char doesn’t fit in 20 bits but doesn’t cover all 21 bits).

This makes a solid case for the differences with char/NonZero*, though I'm still skeptical it's the right choice for Rust long-term. It's the only numeric add where release-profile overflow isn't just handled by the hardware. I suspect there are common cases that would be lighter-weight to do the arithmetic as u8, and then convert to AsciiChar, then to perform multiple AsciiChar wrapping operations. (Relatedly, I'm broadly against adding custom-bit-width integers to Rust. Eager masking/wraparound is, IMO, a niche need that libraries should provide for.)

I think we should discourage users from writing code like x + a'0' or c + (a'a' - a'A') and push them towards helper methods that can be more trivially optimized and tend to be much more readable.

It’s easy to convert to u8. It’s hard to convert from u8.

I challenge "hard". It's fallible, sure, but we could provide a fn from_u7(x: u8) -> Self { unsafe { transmute(x & 0x7f) } } for a safe, infallible "I don't really care if that top bit is set, just truncate to convert like an as cast".

scottmcm commented 7 months ago

This is disingenuous question. You might just as well ask what is intend behind some kind of AsciiChar which isn’t covered by char::is_ascii, str::is_ascii and str::from_utf8?

This is easily answered with reference to the ACP https://github.com/rust-lang/libs-team/issues/179:

The purpose of the type is to be a type-system proof of ASCII-ness so that slice-to-str and vec-to-String conversions trivially do not require checking, and thus can be O(1) in safe code, reducing the need for "I only put ASCII in the buffer so I'm calling from_utf8_unchecked" unsafe code.

Everything else is extra, and plausibly wouldn't even be part of the first wave of stabilization.

The exemplar for using this is fast & safe base64, for example, where AsciiChar: Add isn't helpful, as it'll most likely use indexing into a constant [AsciiChar; 64]. Hex formatting would do the same, though perhaps via [[AsciiChar; 2]; 256] instead.

The numeric formatting in core currently uses addition for generating decimal digits, but I don't think it'd be willing to use an Add that adds extra bitmasking cost to those paths. And if the masking optimizes away in that code, then so would any checks in AsciiChar::new(b'0' + x).unwrap().

mina86 commented 7 months ago

It's a valid question. The Step implementation is O(1), it's just less ergonomic than + usize. The intent and real-world use cases for AsciiChar have been addressed and described. Implementing controversial features requires answers to "Why should we have this when there are workable alternatives?"

It’s not because it presupposes that if something can be done with Step or to_digit than the alternatives are to be dismissed. Getting nth letter and Ascii85 implementation would benefit from Add but of course it all can be done with AsciiChar::new(...).unwrap(). If you don’t think ergonomics are better/worth it than there’s nothing I can say to convince you otherwise.

clarfonthey commented 7 months ago

It's a valid question. The Step implementation is O(1), it's just less ergonomic than + usize. The intent and real-world use cases for AsciiChar have been addressed and described. Implementing controversial features requires answers to "Why should we have this when there are workable alternatives?"

It’s not because it presupposes that if something can be done with Step or to_digit than the alternatives are to be dismissed. Getting nth letter and Ascii85 implementation would benefit from Add but of course it all can be done with AsciiChar::new(...).unwrap(). If you don’t think ergonomics are better/worth it than there’s nothing I can say to convince you otherwise.

The point here is that Add fundamentally is a nonsensical operation for ASCII characters, even though the type internally is just a byte whose value can be added. char doesn't support Add for similar reasons.

The reason why I mentioned Step is because things like getting the nth letter are covered by it; 'a'..='z'.nth(x) works for char just fine and it would also work just as well for AsciiChar too. I would even argue that this form is clearer than using addition, and less error-prone.

I don't see how implementing addition gives you any positive ergonomics, and there's even the additional confusion of what you're adding: u8? other ASCII characters? It just introduces so many problems that feel like they could be easily solved by other methods, which is why I'm asking what problems you'd like to solve that aren't covered by those methods.

joseluis commented 7 months ago

I just want to point out that AsciiChar should derive Default returning the Null variant, matching char's implementation returning '\x00', since I've not seen it mentioned anywhere and right now it lacks it.

clarfonthey commented 7 months ago

I also hate to be a broken record but Step is still not mentioned anywhere in the issue description, and it should be.

scottmcm commented 7 months ago

@joseluis That makes sense to me. Want to send a PR and r? @scottmcm me on it?

@clarfonthey Updated.

programmerjake commented 7 months ago

i think we should add [Char; N]::as_bytes() which returns &[u8; N]

scottmcm commented 7 months ago

I had a weird idea that's probably a vestige from doing C++ a bunch:

Add a pub struct Ascii<T: private::Innards + ?Sized>(T::ImplType);. Then use that to have Ascii<str>, Ascii<char>, Ascii<[char; 10]>, etc that end up storing [u8], u8, [u8; 10] respectively.

Then it has the customized display, without needing to debate things like whether to need to specialize Debug for [ascii::Char] to show a string.

(That's pretty unprecedented for Rust, though, so I assume libs-api wouldn't be a fan. But I figured I might as well write it down.)

BurntSushi commented 7 months ago

@scottmcm My interest is piqued. It reminds me a bit of the type shenanigans used in rkyv. I really like getting a nice Debug impl. I think that's a huge deal for UX. (It was quite literally one of the original motivating reasons for me creating bstr.)

programmerjake commented 7 months ago

If we are going to have Ascii<T>, I would write it Ascii<u8> instead of Ascii<char> since the latter makes me think it's just a 32-bit value range limited to 0..=0x7F, similarly for arrays. For unknown-length, idk if we use [u8] or str... one benefit of Ascii is a owned string type easily falls out: Ascii<Vec<u8>>/Ascii<String>

scottmcm commented 7 months ago

With NonZero<T> in FCP (https://github.com/rust-lang/rust/issues/120257#issuecomment-1950827642), I was inspired by it and jake's previous comment to come back to this and propose something that I think might be reasonable.

We're about to stabilize

pub struct NonZero<T>(/* private fields */)
where T: ZeroablePrimitive;

where ZeroablePrimitive is a forever-internal implementation detail, but there's always a get: NonZero<T> -> T.

So here we could do something similar:

pub struct Ascii<T: ?Sized>(/* private fields */)
where T: SupersetOfAscii;

Then for T: Sized we'd have the same get: Ascii<T> -> T, but we'd also have for everything a as_str: &Ascii<T> -> &str as well as allowing it to deref (perhaps indirectly) from &Ascii<T> to &T.

So internally that might be

pub struct Ascii<T: ?Sized + SupersetOfAscii>(<T as SupersetOfAscii>::StorageType);

Then we could implement that trait for various types -- u8 and u32 where the internal StorageType is a private thing with #[rustc_layout_scalar_valid_range_end(0x7F)], but also implement it for [u8; N] and [u8] and such to cover more general things.

(That would allow Ascii<u32>: AsRef<char> + AsRef<&str> for example, since you can get references to either from it. Might not be worth bothering having that, though, since I've never seen anything that cares about AsRef<char>.)

Thoughts, @BurntSushi ? Would you like to see that as a PR or an ACP (or nothing)?

programmerjake commented 7 months ago

(edit: nevermind, it only works for ascii, i was thinking of general Unicode) ~~you can't simultaneously AsRef<str> + AsRef<char> since they're encoded differently, unless you have duplicated storage (which I think we can agree is not what we want)~~

scottmcm commented 7 months ago

@programmerjake For a single ascii character stored as u32 you can -- you just need to return the &str to the correct byte.

(You certainly can't for Ascii<u8>, but for Ascii<u32> it works.)

programmerjake commented 7 months ago

If we have Ascii<T: ?Sized>: Deref<Target = T>, I think we'll need both Ascii<str> and Ascii<[u8]> as well as the corresponding owned types. We should have Ascii<[u8]>: AsRef<Ascii<str>> and visa versa, and other corresponding conversions for owned/borrowed types.

scottmcm commented 7 months ago

On Deref: yeah, I haven't quite thought this one through all the way. I added the "(perhaps indirectly)" parenthetical to try to add some space for it -- like maybe we don't always have Ascii<T: ?Sized>: Deref<Target = T> because we deref the ascii one to something else that then derefs to the original thing.

But thinking about it more as I type, maybe it's just a bad idea. We don't have &String -> &Vec<u8> as a deref coercion -- not even indirectly -- so maybe trying to have it here would be wrong too.

Maybe I should propose Ascii<T: ?Sized>: AsRef<T> as the general thing instead, since that we can definitely do, and we'll be more limited in which things we allow to Deref at all.

Kimundi commented 7 months ago

Heh, that actually reminds me of the API I came up with for a toy hex formatting crate of me: https://docs.rs/easy-hex/1.0.0/easy_hex/struct.Hex.html

Basically, the whole api resolves around being able to cast T to/from Hex<T> just to get changed trait impl semantic. In my case its just a repr(transparent) wrapper, so I do not change representation, but still the idea seems similar.

That said, I feel like a basic AsciiChar type is still the best course of action, otherwise it seems like the whole API discussion here has to be started from scratch :D

For the [AsciiChar] does not implement Debug ruight problem, could we maybe provide a struct AsciiSlice([AschiiChar]);, and just make it easy to convert to/from the basic slice type? I could image that becoming useful for more things than just the debug formatting impl.

jongiddy commented 5 months ago

Will it be possible to backwards-compatibly redefine [u8]::escape_ascii to return an Iterator<Item=AsciiChar> instead of the current Iterator<Item=u8>?

Currently the returned iterator provides a to_string() method that collects the characters into a string, but if any other iterators are chained, we lose the info that the bytes are safe to collect into a String.

programmerjake commented 5 months ago

Will it be possible to backwards-compatibly redefine [u8]::escape_ascii to return an Iterator<Item=AsciiChar> instead of the current Iterator<Item=u8>?

yes, but likely only if rust implements something like edition-dependent name resolution where the same name resolves to two different escape_ascii functions depending on the edition.

I previously proposed something like that: https://rust-lang.zulipchat.com/#narrow/stream/122651-general/topic/Effect.20of.20fma.20disabled/near/274199236

clarfonthey commented 5 months ago

This is a random thought I had, but seeing the progress on this makes me wonder if we could make char itself implemented as a lang item struct instead of a primitive in the language without breaking anything.

Figuring out what would have to be done so avoid any breakages there could probably be useful here, since they would also need to be applied to ASCII chars too.

scottmcm commented 5 months ago

@clarfonthey Practically I think no, because char has exhaustive matching for its particular disjoint range, which isn't something exposable today.

Maybe eventually, with fancy enough matching-transparent pattern types, but I'm not sure it'd ever really be worth it. (For something like str, sure, but char has so much special stuff that I'm skeptical.)

clarfonthey commented 3 months ago

Getting back to this feature a bit. Genuine question: instead of the common named variants, now that the precedent has been set for CStr having dedicated c"..." literals, would it be reasonable to add a'x' and a"..." literals? This could also solve the issue with ascii_char_variants being unstable, preferring that you use the literals instead of the variants.

Not sure if implementing this as a POC in unstable would require an RFC or not.

leb-kuchen commented 3 months ago

is it possible to hide the variants in the sidebar of the documentation or even better the entire section?

NathanielHardesty commented 1 week ago

Re: ASCII Literals

As an end user of Rust who wants good ASCII support, I believe the best way to improve this feature is to add literals such as a'A' and a"ASCII" as others have suggested. Doing fancy manipulations on strings is far less important (to me anyways) than just being able to write strings so that they can be stored in structs or passed into functions.

Currently, the best solution to arriving at the ASCII subset is this:

[Char::SmallA, Char::SmallS, Char::SmallC, Char::SmallI, Char::SmallI]

Being forced to write out all of the Char variants just to arrive at a [Char] is cumbersome and it seems as though these variants are an unstable feature unto themselves which is best avoided.

On the other hand, byte string literals can be used in this manner:

b"ascii".as_ascii().unwrap()
unsafe {b"ascii".as_ascii_unchecked()}

This forces me to use runtime checks, or simply cross my fingers, rather than use compile time checks in order to assure that the ASCII subset is not being violated. I'm perfectly fine with doing runtime checks when I'm parsing what very well could be arbitrary data, but source code is not arbitrary and violations should be able to be caught at compile time.

ChayimFriedman2 commented 1 week ago

@NathanielHardesty If the functions for constructing &[AsciiChar] from &[u8] are const, you can create a macro for it (it's still worse than compiler support, but not by much).

Marcondiro commented 4 days ago

Hello, do you plan to add a char::to_digit(self, radix: u32) equivalent? This method was proposed in https://github.com/rust-lang/rust/issues/95969 / https://github.com/rust-lang/rust/pull/96110 for u8 but was rejected. Thanks!

NobodyXu commented 3 days ago

@Marcondiro you will need to open an issue in rust-lang/libs-team, which is called an ACP.

The libs-api team would then provide you with feedback there.

scottmcm commented 3 days ago

Interestingly, it looks like char::from(a).to_digit(10) actually optimizes well: https://rust.godbolt.org/z/K5nebha3a

I had been about to say "just send a PR since it's something char has", but then I looked at the to_digit signature, and now I want an ACP to discuss u32 vs some other width for the argument and the return :/

rust-lang / rust