rbwhitaker / CSharpPlayersGuideEarlyAccess

A place to track issues with the C# Player's Guide for patches and future editions
19 stars 0 forks source link

UTF8 strings #658

Closed rbwhitaker closed 1 year ago

rbwhitaker commented 2 years ago

C# now supports making UTF-8 string literals easily convertible to byte arrays in the form of ReadOnlySpan<byte> by tacking a u8 on the end of a string, after the quote marks. This is a new C# 11 feature.

And it is a nice feature, but the frequency of use for most people will be a little on the low side. Except in web dev scenarios where you might have to convert stuff to UTF-8 from UTF-16, the natural representation in C#. However, I'm not convinced that this really solves all the problems cleanly, because you still have to combine strings to get the final result in most cases. I've seen some comments about an addition operator, so maybe this is partly solved.

It is currently unclear to what degree this should be discussed in the book. I think covering it early when strings are discussed is too early. We don't know about ReadOnlySpan<T> there; in fact, I don't think the book even touches on that, though it definitely could. But we don't even know a thing about generics at that point. So it seems this would be too early.

If I had to take a guess about where I might cover this, it would be in the Catch-All chapter at the end, though I'm actually trying to reduce that down to nothing in the future.

rbwhitaker commented 1 year ago

I'm leaning more and more toward skipping this entirely...

rbwhitaker commented 1 year ago

Perhaps a good compromise here is a blog post. It's hard to justify it in the book, given the current discussion on Span<T>. This adds a bit of weight to the notion that I should maybe have an actual chapter about Span<T>, because UTF-8 encoded strings is a cool convenience feature. But I don't think I want to bite that off in this edition, and a blog post could serve as a good alternative, to feel like I did, actually, cover it, without eating up precious printed pages for it.

rbwhitaker commented 1 year ago

I've added a blog post for this. The more I think about it, the more I think it is important to point out that these are not UTF-8 strings, at least not in the way C# thinks of strings. These are byte arrays (or ReadOnlySpan<byte>, more accurately) that the compiler generates by taking text and encoding it in UTF-8 for you. I definitely think an ASP.NET book ought to cover this. They're quite useful. But it is probably not well-suited to The C# Player's Guide. (Maybe the ASP.NET Player's Guide.... some day!)