Feedback and ideas for making custom enum-based status_code_domains an easier process

jwtowner commented 4 years ago

Hey Niall,

I'd like to give you some feedback on how we've been using status_code and error in a few large projects over the past couple of years. We've had a lot of success with the library, but there have been a few rough edges and a major pain point, but more on that below as well as my solution to it.

Now, just for some background, I've implemented our own independent version of the status-code library directly from P1028, and did so for a couple of reasons. First, for better integration into our own proprietary low-level C++ extensions library and second because we're currently using C++17 as a baseline and that naturally allows for a much more concise implementation. Furthermore, doing so allowed me to vet your design and give you all of that feedback a while ago as well, so there's also that, hehe. Anyway, you might be happy to know that we're using this in Microsoft Solitaire Collection, which of course comes pre-installed on Windows 10 these days and has wide adoption in both the Apple and Google Stores. You can probably imagine the size of the install base. So if anyone is still doubting that the status-code library as proposed isn't in wide use, you can always give them that, hehe.

So jumping right into things, we started off by using type-erased error objects for propagating errors in an exceptionless manner across threads with our own multi-platform concurrency library that has futures & promises, senders & receivers and executors. We opted to use std::expected to package return value and errors together, since at the time it seemed fairly mature in its design. All of std::expecteds copy constructors and copy-assignment operators automatically get excluded from the overload set when using a move-only Error object, and then all that needed to be done was to specialize std::bad_expected_access for both status_code<erased<ErasedType>> and error so that it calls .clone() to store a copy of the underlying Error. If we were to do it all over again, we might consider using result<T> but we will probably hold off on changing anything until LEWG makes further decisions on P1028 and P0709. So if they don't like neither P0709 nor result<T>, in case you haven't thought of it already--although I bet you probably have--you could as a last resort try pitching std::expected with move-only status_codes using the above technique to allow P1028 to go through, perhaps just making std::result<T> be a template alias for std::expected<T, std::error>. But I digress.

Moving on, I think the core part of the status-code library is a little complex and you certainly have already heard complaints along those lines. But after implementing it myself, I think that complexity is fully justified given what the library does for us. It really enables a rich and powerful way of working with various different types of error codes in a more-or-less orthogonal and extensible manner. It address all of the issues with . For all of the builtin system-level error codes, the library does a great job. And for users of the library, things are more or less straightforward and easy to comprehend when working directly with a status_code or error object, especially once they understand how semantic equivalence testing works and that you can have equivalence classes of status codes in the same way that you can with exceptions.

The big problem, and this was the major point of pain for us I was talking about earlier, is that's really really difficult to teach everyone how to effectively write concrete status_code_domains for custom enum-based status codes. The reasons for this I imagine are kind of related to the same reasons its often hard to get everyone to write exception-safe code all of the time, which is why we originally chose not to use exception handling in the first place. The intricacies of error handling sadly is not an area of interest for most people, and when you're on a deadline and needing to finish off implementing end-user facing features for the next milestone, things like setting up and utilizing status_code_domains properly or writing exception-safe code all end up low on the priority list.

Having to write one to two hundred lines of code to create a status_code_domain each time you want to allow an enum to be used with the status-code system just doesn't scale well. On the other hand, when you look at exceptions in C++, the one nice thing about exceptions is that it's a really simple process to create a new exception class. You merely derive from std::exception or another child class of std::exception that already exists and you're off to the races. If you need to give the exception some state and a custom error message, you add some member variables and override what(). Easy. It would be really nice if creating custom status_code_domains for enums, or indeed any trivial or move-relocatable class type, would be just as simple. I believe status code domains need to be invisible most of the time, and yet something that you can get at when needed and have it all just work.

Now initially, I myself played this off as not a big issue. Mostly because at the time I couldn't think of a better way of doing things. But as the complaints continued to roll in and as the situation worsened, I realized something needed to be done about it or we'd be in trouble. I won't go into the precise details, but even even for the C++ experts, having to set up a new status_code_domain was becoming an exercise in writing boilerplate, when we'd rather be working on more important things. When you have multiple dozens of custom enum-based status codes all throughout your code base for your various application level components and services, it's a real chore to maintain all of that.

What we needed was a robust solution from the generative programming department. So ultimately, after several iterations of trying out mixin and CRTP class designs and various other techniques, the solution I arrived at to turn say an imaginary Qux::ServiceError enum into a status code enabled one looks like this:

namespace Qux {

enum class ServiceError
{
  RequestDenied,
  Disconnected,
  NetworkFailure
};

} // namespace Qux

template <> struct status_code_lifter<Qux::ServiceError> {};

That's it! And surprisingly, it works on every popular C++17 compiler today.

Now, the purpose of the above status_code_lifter template specialization is to "lift" a user-defined concrete type, in this case an enum, into the higher-order abstract type system of the status-code library, imbuing it with all of the attributes and operations one might expect of something that satisfies the StatusCode concept . Or at least, that's my mental model for how I'm thinking about things, haha. Perhaps there's a better frame of reference and name for status_code_lifter?

But essentially, by specializing status_code_lifter for a given type T, a unique status_code_domain is generated for it with default behavior that covers 95% of our use cases. For us, we found that the vast majority of our enum status codes are failure-only errors and there are no success codes--success is transmitted via std::expected when it is returned with a value. Additionally, the vast majority of our enums aren't equivalent with other status_codes, not even generic_code and errc. Each enumerator value is unique. Thus, for our default _do_equivalent member function, generating it is trivial.

As for the implementing _do_message, the vast majority of our error messages are for internal logging. We usually don't need user-friendly descriptive strings, we just need something unique and descriptive enough that allows us to identify what error occurred and track down the problem. Basically, it'd be nice to use the enumerator names themselves as strings, which is normally impossible to do without compile-time meta reflection.

Well, it turns out there's a library named Magic Enum developed by Daniil Goncharov that as far as I am aware of pioneered this technique, and it does compile-time reflection of enums in C++17 by parsing the names from the __PRETTYFUNCTION__ or __FUNCSIG_\ string literals inside of a template function instantiation, with the template arguments being those we are interested in parsing the names thereof. In fact, using the same technique as Magic Enum, it's fully possible to implement a mostly compliant albeit limited subset of the Reflection TS for type and enumerator name meta reflection, which is I what I ended up doing. One could probably also do the same to implement a subset of P0993 value-based meta reflection. There are some limitations and caveats with the technique, but they can all be worked around.

Now after reflecting the enumerator names, each gets transformed into a space-delimited lowercase string with word boundaries defined by underscores and/or uppercase-to-lowercase transitions. In the previous example, this would result in "request denied", "disconnected" and "network failure". As for compile times, it's actually not bad at all, there is a minor cost but it scales linearly with use. I imagine this will improve when meta reflection becomes a core feature of the language (here's hoping for C++23!) And there's no additional runtime cost or executable size bloat, only the final strings end up in the executable image.

As for the domain name, that is taken from the compile-time reflected fully-qualified type name of the enum, and the domain id is generated by hashing this string at compile-time.

If users need to override the default behavior normally granted by status_code_lifter, individual static member functions or constants can be specified on an as-needed basis for the domain id and name, or for equivalence testing, or for what values are considered failure or success codes or for the message strings. For instance, if custom localized message strings are suddenly needed because it was decided we need to show a dialog box to the end user, no problem:

template <> struct status_code_lifter<Qux::ServiceError>
{
  static string_ref message(Qux::ServiceError se) noexcept;
};

string_ref status_code_lifter<Qux::ServiceError>::message(Qux::ServiceError se) noexcept
{
  // fetch current locale, perform lookup in string table, etc.
}

Now as for the status_code_domain class itself where all of this behavior is generated, I've created a basic_status_code_domain<T> template class, where T is the same type as in status_code_lifter<T>. The basic_status_code_domain<T> template class will detect any of the optional members the user has provided in their status_code_lifter specialization and will use them when detected or go with the default behavior. And not only does it work for enums, but it could be extended to work for any trivial or move-relocatable class type for which there is a specialization of the C++20 std::formatter<T> template class that would provide the message strings for a default _do_message() in the absence of a user-provided message() function.

Finally, the method by which these lifted enum-based status codes get implicitly converted into an actual status_code or error object is with a fallback overload of make_status_code that is added to the overload set when status_code_lifter is specialized for the first argument's type, and it will use that without the user needing to manually provide their own make_status_code overload. However, if custom construction of objects are needed for whatever reason, the user is still free to provide such an overload in those particular cases.

Now, it might be crazy to be doing all of this, especially in C++17, but it has basically solved all of our complexity and maintenance issues when creating custom enum-based status_code_domains.

Furthermore, it got me thinking, with metareflection and metaclasses on the horizon for C++23 and C++26, the idea could be taken further. Consider if we had user-defined attribute metareflection. Then it might be conceivable to allow a user to write something that looks like as follows, achieving a similar result as the prior example, but with attributes to override defaults and provide additional meta information to the basic_status_code_domain template:

namespace Qux {

enum class [[status_code]] RequestResultCode
{
  RequestReceived [[status_success]],
  RequestDenied,
  UserCanceled, [[status_equivalent(generic_code, errc::operation_canceled)]]
  NetworkFailure [[status_message("network is down")]]
};

} // namespace Qux

Another idea is that perhaps domain identity could be directly discerned from the reflected meta object identity of the concrete status_code_domain type itself, and then there would be no more need to generate random or hashed 64-bit domain ids, thus solving the issue of domain id conflicts.

Anyway, I apologize for the long-winded post, but overall what do you think? Is this something worth pursuing? I'm not sure if someone else has already come up with a similar solution or set of ideas. Is this something you might want to roll into P1028? Or perhaps a separate proposal? I honestly don't know myself. But I do strongly feel that streamlining the creation of custom enum-based status_code_domains would greatly aid in the wide adoption of status_codes as an acceptable alternative to exceptions.

ned14 commented 4 years ago

Firstly, thanks very much for telling us your implementation experience with P1028. I have posted a link to this github issue to the LEWG reflector, so everybody else can gain from your report.

Secondly, you literally reimplemented what I was going to add to this library when Reflection shipped in at least one major compiler. So very good to know you were thinking the same thing.

Thirdly, there is another technique available here: lists of lists to create constexpr tables. So, for example, you could simply declare the mapping between enum values, the generic codes they are semantically equivalent to, and their message strings. This isn't as automatic as a Reflection based design with a single line to generate a status code domain for an enum, but it is a lot better than the tedious boilerplate you just mentioned. Also, it is available to C++ 14 and doesn't have the hefty compile time impact which magic enums currently have. I have added an issue to track that to #23 . I am happy to return to a Reflection based API when Reflection ships in a compiler.

Finally, on the unique id generation for each domain, it is super important statistically speaking to ensure there is a full 64 bit of entropy in each id. Generating that from the name of an enum supplies insufficient entropy. LEWG discussed this at length in Prague, and we decided on a GUID for a constructor. This is compile time parsed and hashed into a 64 bit unique id, and the lists of lists API would require the user to give it a GUID off random.org.

So while I don't think we can safely give you all you ask, I do think we can significantly improve on the current situation. For the record, I too also find writing out custom status code domains tedious and a waste of time better spent on other work, so a declarative API would also suit me personally too.

Thanks once again for the extensive feedback, it was very useful.

ned14 commented 4 years ago

@jwtowner I have been asked by WG21 if you would like to submit a P-paper containing your implementation experience above. I can guide you through the process, if you are willing.

jwtowner commented 4 years ago

Yep, I'd be more than willing to submit a P-paper. Would you like me to send you a private email to get things moving? And also thank you for designing such a great library in the first place!

ned14 commented 4 years ago

Sure, send me an email. You can find mine from any of my WG21 papers.

ned14 commented 2 years ago

Sorry to reawaken a stale thread: @jwtowner Can I get your explicit permission to quote your initial comment above in the next revision of P1028R4?

jwtowner commented 2 years ago

Yes, not a problem. I apologize I haven't been able to contribute these past few years, been pretty hectic what with Covid and everything. But you certainly have my permission to quote the above.

ned14 commented 2 years ago

Cool, thanks. I have been similar, hence no recent new WG21 papers. But work recently allowed me two work days per month for standards papers, so new revisions will start slowly dripping out. Thanks for the explicit permission!

ned14 / status-code

Feedback and ideas for making custom enum-based status_code_domains an easier process #22