rust-lang / libs-team

The home of the library team
Apache License 2.0
110 stars 18 forks source link

ACP: macro for UTF-16 literals #370

Closed jmillikin closed 4 weeks ago

jmillikin commented 2 months ago

Proposal

Problem statement

UTF-16 is a common text encoding in platforms that adopted Unicode before the invention of UTF-8. C compilers for these platforms have special syntax for UTF-16 string literals. It would be nice if Rust had similar functionality available via macro syntax.

Rust currently only supports UTF-8 literals, and converting them to UTF-16 requires either allocation or the use of a third-party macro crate. This is unergonomic and reduces Rust's competitiveness vs C/C++ in no_std contexts (i.e. DLLs).

Motivating examples or use cases

UTF-16 (or a superset) is the native encoding of Windows and Java, so code working with Win32 or JNI often involves string constants that need to eventually be UTF-16. Some platform interop crates include helper macros for UTF-16 literals, for example windows-sys provides windows_sys::core::w!().

Solution sketch

Define a core::str::utf16!() macro that receives a string literal and produces a [u16; N] containing the UTF-16 representation of its input.

use core::str::utf16;

// equivalent
const HELLO: [u16; 6] = utf16!("Hello!");
const HELLO: [u16; 6] = [0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x21];

// equivalent
const HELLO_REF: &[u16] = &utf16!("Hello!");
const HELLO_REF: &[u16] = &[0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x21];

I'm uncertain about whether a NUL-terminating variant is justified:

Alternatives

Continue using macros and/or const functions defined in third-party libraries.

Links and related work

A quick survey of crates that provide similar functionality in their public API (often their only purpose):

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

Second, if there's a concrete solution:

ChrisDenton commented 2 months ago

We do have a couple of macros that are used internally. Note though that they are only intended for internal use.

tgross35 commented 1 month ago

I feel like if we add a macro to create a string, we should also have one for CStr / c"lit" to be consistent. (There currently isn't any way to convert a regular Rust string literal to a CStr literal without proc macros)

ChrisDenton commented 1 month ago

I think the point of a macro is to workaround the lack of a string literal type. It doesn't make sense to have both, even if there's a practical reason they might not be exactly equivalent.

tgross35 commented 4 weeks ago

@jmillikin for posterity, was there a discussion for rejecting this?

CryZe commented 4 weeks ago

@tgross35 They closed all their RFCs and ACPs from what I've seen, so it's not related to this specific ACP.

jmillikin commented 4 weeks ago

I'm just garbage-collecting stale issues / PRs. This suggestion wasn't rejected, and if someone else has time to drive it forward then feel free to re-file.

pitaj commented 4 weeks ago

Why did you consider this stale? It's only been open for two months. Were you expecting an earlier response?

jmillikin commented 4 weeks ago

It's a pretty small ACP, so I figured if there was any interest at all from the libs-team folks then they would have said so.

I think two months is a reasonable timeout.

pitaj commented 4 weeks ago

There are a lot of ACPs and the queue is long. There are years-old ones still getting accepted. Example https://github.com/rust-lang/libs-team/issues/163

kennytm commented 3 weeks ago

As of today this repository has 95 open ACPs. Resolving all of them in two-months time (i.e. 8–9 weeks) would mean processing 10–12 proposals per weekly meeting on average. This does not seem to be a reasonable workload at all, clearly there is a bottleneck here.

Maybe the ISSUE_TEMPLATE can at least spell out the current throughput so contributors can have an expectation at most how long they will need to wait before first response. :thinking:

scottmcm commented 3 weeks ago

Personally, I think it's fine to leave this to https://docs.rs/windows-core/latest/windows_core/macro.w.html and friends. Getting it from a windows crate for interop with windows apis and a JNI crate for interop with java seems quite reasonable, especially if there's no type needed. If different people end up using different macros for [u16]s, that seems completely fine, since they can still talk to each other.

(And often these don't even want UTF-16, they want some kind of YOLO-16 because of historical reasons where it's entirely normal to have unpaired surrogates or whatnot. Which is all the more reason to not have it in the standard library, IMHO.)

ChrisDenton commented 3 weeks ago

(And often these don't even want UTF-16, they want some kind of YOLO-16 because of historical reasons where it's entirely normal to have unpaired surrogates or whatnot. Which is all the more reason to not have it in the standard library, IMHO.)

At least on Windows, it's entirely abnormal to have unpaired surrogates, to the point that many applications will, at best, be unable to e.g. open a file with such a corrupted file name. But if std did have a type (rather than just str -> [u16]) it'd be fine for it to have weaker guarantees then str.