Integration with the error message index?

david-christiansen commented 1 year ago

errors.haskell.org presently documents error messages and warnings from GHC, starting with version 9.6.1 once it's released. The design, however, is intended to accommodate any Haskell tooling whatsoever. Would Cabal be interested in integrating with the site?

What it entails:

Each error or warning is assigned a unique code of the form [C-12345], where "C" is a namespace (we can talk about what it would be) and "12345" is a number that's unique in that namespace
These codes are kept consistent over time, even if the text is rephrased.

As I see it, there'd be the following benefits:

Users have an easier time contributing documentation for how to work around various situations that may arise, so more docs may exist
Error messages gain a notion of identity that allows them to be rephrased without breaking all docs
Interactive tools that invoke Cabal and GHC can link to one consistent, predictable location if an error message occurs, and new users don't have to figure out which tool is the source of the error

The costs would likely be:

A one-time cost to put in the codes and update all tests that refer to the string forms of errors
An small but real ongoing burden to think about the relationship between messages in older and newer versions to keep the semantics of the codes consistent over time
Some user-written documentation will be in a location other than the manual

What do you think? Is this interesting?

mouse07410 commented 1 year ago

I think it's worth it.

Mikolaj commented 1 year ago

Agreed, it's a great idea and we'd love for somebody to integrate this with cabal.

Let's discuss how and where it could be integrated. E.g., cabal now consists of 4 packages: Cabal, Cabal-syntax, cabal-install, cabal-install-solver. I'd assume the Cabal package errors may need to be somehow linked to GHC errors, especially if GHC may emit them when it calls Cabal functions. Solver errors are particularly large and, if represented well, useful. Perhaps another split than per-package makes more sense, or no split at all.

For GHC and GHC tools (ghc-pkg, anything else?), I think right now cabal just let's GHC display its error messages, not intervening in any way, and saving the errors and warnings in a log file together with cabal's own messages. If GHC additionally outputs errors.haskell.org links, I don't think anything would need to change, except a few places where the GHC errors are recorded in test output and possibly many more places where warnings are.

Another angle would be adding all cabal errors to the index. That would require a design and an overhaul of at least the die or die' machinery, optimistically assuming we don't ever error instead.

Any other angles?

gbaz commented 1 year ago

For GHC I think there's nothing that needs to be done -- this is just an independent improvement to GHC's messages.

It's more that we could attempt to refactor the cabal system along the same lines as ghc -- cleaning up the current ad-hoc mechanism and moving to representing cabal errors as a sum type with a mapping from constructors to error codes, such that those could also be documented on the site. I.e. I think the "another angle" is all this proposal is really about. I think even absent the error index, the cabal error reporting mechanism really could use a full overhaul regardless. But also, this is a fairly large project, and would need an interested person to take it up and push it forward -- perhaps as a future GSoC project.

ulysses4ever commented 1 year ago

Stack done it swiftly! Here's their:

announcement
PR with the patch
ticket with interesting discussion

Stack was in a markedly better position because they have structured errors (we have a recent ticket for it and no volunteers on the horizon: #8618). They already assigned all the codes to the corresponding data constructors for their errors (see their docs). Then, they updated the error messages by sticking those codes, and the messages are handily localized in the Show instances for those type constructors.

What remains there is to populate errors.haskell.org. But this can be done continuously in the coming future.

ulysses4ever commented 1 year ago

I added more details on how Stack did it in the previous comment. This may serve as a possible plan for us.

ulysses4ever commented 1 year ago

Nothing blocking this ticket, so I remove the "blocked: ghc" label.

BasLaa commented 1 year ago

This would very useful perhaps especially as Haskell has less documentation of errors on stackoverflow than other more popular languages, so having a centralised collection of them would be great.

I read through the ticket on how Stack did this, and it seems like for them at least it was relatively straightforward, but their errors were already represented in a way that made it more suitable to implement which cabal doesn't really do (I think?). I would definitely be interested in learning more about this in the upcoming weeks and perhaps doing it as part of a google summer of code project though.

Mikolaj commented 1 year ago

Amazing! @gbaz: I forgot the details of our next GSCO that you initiated. Is there a webpage about that already?

ulysses4ever commented 1 year ago

@Mikolaj https://summer.haskell.org/ideas.html#cabal-errors

Mikolaj commented 1 year ago

Yay. So cool.

david-christiansen commented 1 year ago

This is great to see!

BasLaa commented 1 year ago

I have been looking into how Stack made the transition to have structured errors. It seems that they already had datatypes for some of their exceptions, which they expanded by grouping similar errors as constructors of these datatypes, with a nice show instance for every exception. After that they added unique error codes to every show instance. Their errors are now integrated with the error index, although actually documenting is not going super quickly yet: only two errors are documented right now. So perhaps as part of this project at the end it would be good to also actually get the documenting of cabal errors started.

For cabal, making this transition would take more work as I believe that all errors are done using die' and exceptions don't have specific types to distinguish them, so there is no underlying structure for errors yet. I would guess that the biggest part of this project is moving away from die' to creating types for exceptions that can be re-used and assigned a specific error code.

While reading through the die' function, I also came across this note: "When we die', we must raise an IOError. This a backwards compatibility consideration, because that's what we've raised previously, and if we change to any other exception type, exception handlers which match on IOError will no longer work. One case where it is known we rely on IOError being catchable is 'readPkgConfigDb' in cabal-install; there may be other user code that also assumes this."

This might be an issue if we would want to move to different datatypes for exceptions. Another consideration might be how to handle Verbosity with different exception types, as this is now simply passed as a parameter to die'. Overall, it seems like quite an overhaul would be required, especially compared to what Stack had to change. Once it is done and working though, it would give a nicer representation of errors that can easily be extended in the future when new errors arrive. What do others think?

Mikolaj commented 1 year ago

This might be an issue if we would want to move to different datatypes for exceptions.

How do other tools do it? GHC, stack? And what is the requirement for the error messages index? Does it require a particular stderr output with numbers in square brackets or is there an expectation that a particular exception type value can be caught and the payload inspected?

BasLaa commented 1 year ago

From what I gather, only a stderr output with the particular code is strictly required for the integration with the error index. Could you perhaps confirm this @david-christiansen ? However, bringing more structure to errors in cabal would help with recognising similar errors and assigning codes to them, whilst also being generally beneficial. GHC also moved to structured error data types with constructors from semi-structured strings especially so that other tools like HLS that work with GHC diagnostics could take advantage of the structure instead of parsing the output of the error, and something similar for cabal could be useful.

Stack now implements exceptions as data constructors of sum types, which are created in any module that needs to export exceptions, so generally one type per module with several constructors and arguments to each constructor, and they import these in other modules as needed. They then derive a Show instance for each exception, and a Control.Exception.Exception instance. They use functions from Control.Exception.Exception to throw their errors, mostly throwIO, which returns `IO a'. They also have a 'PrettyException' for different cases, but I think their plain exceptions are a good start to take inspiration from.

I think in cabal it should also be possible to move from die, which returns IO a as well, to the functions given by Control.Exception.Exception. The first requirement would be to document the errors in cabal and see if we could fit them in sum datatypes per module as well, and then see how we could implement the showing of the exception together with verbosity

david-christiansen commented 1 year ago

For the error message index, the requirements and recommendations for tool developers are documented here: https://github.com/haskellfoundation/error-message-index/blob/main/tool-developers.md

In summary, the requirements are:

Documentable errors and warnings should have a code that starts with a namespace, so that (say) Cabal errors and GHC errors can be distinguished (HF administers namespaces), and this code should be emitted in square brackets
Once a code is assigned, it is never used for an error or warning with a different meaning in the future (they may be retired, or the messages rephrased, but if 123 means "file not found" today, it should never mean "parse error" in the future)

There are no requirements on the API that the tool provides to other developers - the error message index is strictly a user-facing tool.

@BasLaa is mostly correct - there is no requirement that it be on stderr, it's fine to emit things to stdout or a GUI window or whatever makes sense for a given application.

Mikolaj commented 1 year ago

In that case, while using bespoke exceptions is certainly natural, we don't have to do this or to focus on this. We can have a sum type of errors or several such sum types, per package or per component in our repo. We can assign starting codes to each of these and number independently. Eventually, we can either throw a sum type exception, or a dumb exception with a message generated by inspecting the sum type. What I'm saying, using exceptions is an implementation detail that should not hinder coming up with an elegant design.

BasLaa commented 1 year ago

Yeah I definitely agree, and like you said, the starting point would be documenting all the existing errors and mapping them onto several sum types.

I read that Haskell sadly wasn't accepted as an organisation for Google summer of code this year, but they let me know that they are planning to organise a summer of Haskell separately, so if that works out well I could write a proposal for this project as part of that instead.

BasLaa commented 1 year ago

Since Haskell.org is organising a separate summer of haskell, I have started to write a proposal for the project of integrating cabal errors with the haskell error index. The deadline for sending in proposals is April 20th, but I will try to get a draft proposal done for some feedback hopefully in around a week. As I have exams this week I don't have as much time to work on it in the next couple of days.

Mikolaj commented 1 year ago

Amazing. Please keep us posted. The IRC/Matrix channel #hackage may also be a good place to coordinate. Good luck with your exams. :)

BasLaa commented 1 year ago

i have written a draft proposal on google docs here. I believe anyone should be able to leave comments. I am curious to hear your opinions, especially what you think of the implementation.

BasLaa commented 1 year ago

In my proposal I proposed creating instances of the Exception typeclass for each sum type representing some errors and then using the throwIO functions for throwing errors, in a similar way to how Stack implemented their errors. This requires a Show instance for the type as well, but the show method for an error would be dependent on the verbosity, so this would need to be added to the data constructor of the error. However, @david-christiansen noted that verbosity is more a feature of the context of the error than the error itself, so it would not really make sense to have this be part of the constructor.

Perhaps an alternative would be to have every error type implement a function errorToString :: E -> Verbosity -> String, and then use the userError and ioError functions, that are currently used as well, to throw the errors so that an instance of Exception isn't needed.

ulysses4ever commented 1 year ago

Has anyone looked into how GHC did it? Its code base is much closer to what cabal has than the stack code base, i think. (In that the starting point was completely unstructured errors.)

BasLaa commented 1 year ago

I believe this article outlines how they created a new structure for their errors: https://well-typed.com/blog/2021/08/the-new-ghc-diagnostic-infrastructure/. I'm not sure if I'm understanding it correctly, but I believe they are not working with printing dependent on some verbosity parameter here.

ulysses4ever commented 1 year ago

@BasLaa correct, thanks for confirming. I was more curious weather they went through the Exception typeclass and throwIO or something else.

Also, while it's true that they don't seem to print depending on verbosity, they have a dynamic parameter Severity (see the blog post) that form a part of a "message" that also holds the actual error/warning-encoding constructor. You could imagine a message type that records the current verbosity level...

Also, GHC does different vebosity levels (e.g. -v0). I wonder how it fits with their new framework.

BasLaa commented 1 year ago

I wanted to give an update on this: My proposal for summer of Haskell ended up being accepted, however in the end I decided to accept a different opportunity for over the summer, so I will not be working on this project over the summer. However, I would still be interested in being involved in working on this after the summer in my spare time when I have more time, if no one else has done this project at this point. Hope you all understand!

ulysses4ever commented 1 year ago

@BasLaa no worries and have a great summer!

Mikolaj commented 1 year ago

That's fine. Let's keep in touch and happy hacking!

SuganyaAK commented 1 year ago

Hi, I am working on this project as part of summer of Haskell.

I am new to the Haskell community and just setting up the Environment and parallely looking into the relevant tickets and the codebase. This ticket has lots of good discussion on the project.

Excited to collaborate! Thanks.

Mikolaj commented 1 year ago

@SuganyaAK: Hi! Welcome!

Please also consider joining the #hackage channel via IRC or Matrix and joining our fortnightly video (or sound only) chat. If you send me your email to mikolaj@well-typed.com, I will add you to an invitation to the chat.

BinderDavid commented 1 year ago

Hi, I am the other David from the error-message-index. I have seen that in some of the already merged PRs (e.g. https://github.com/haskell/cabal/pull/9018/files?show-viewed-files=true&file-filters%5B%5D=) there are already errors which use the [Cabal-XXXX] format. Are these error codes already emitted in a released version of cabal? If so, I can add the Cabal- prefix to the list in our README-md

ulysses4ever commented 1 year ago

Hi David! Not in a release, no. It will likely appear in Cabal 3.12. Which will be released maybe right after the upcoming 3.10.2, but that one won't have it either.

philderbeast commented 1 week ago

Is anyone watching this issue able to review https://github.com/haskellfoundation/error-message-index/pull/536?

geekosaur commented 1 week ago

Review, yes. Review that counts, no: I don't have write access in that repo.

philderbeast commented 1 week ago

Are these error codes already emitted in a released version of cabal?

@BinderDavid I checked Cabal-7070 and it was only cabal-install-3.12.1.0 that included it. Earlier releases in the 3.10.* series didn't.

Thanks for the review @geekosaur.

geekosaur commented 1 week ago

Yes, they are released as of 3.12. Updating the error index with them was left as future work, though (hence this PR).

BinderDavid commented 1 week ago

Yup, I am still watching the error-message repo. I am just winding down from a week of ICFP and a stressful few weeks prior :) I unfortunately didn't have as much time as I wanted to review PRs in that period. But I will have more time now. If there are more PRs planned to add Cabal error messages to the index I would also be grateful for any help reviewing those.

ulysses4ever commented 1 week ago

Happy to review cabal submissions on the error index repo but don't have the permissions to formally approve.

BinderDavid commented 1 week ago

Happy to review cabal submissions on the error index repo but don't have the permissions to formally approve.

Thanks! I gave you necessary permissions :)

ulysses4ever commented 1 week ago

Thanks!

ulysses4ever commented 1 week ago

Should this issue be closed as completed btw?

geekosaur commented 1 week ago

I wouldn't think so: this is only one of the many Cabal and cabal-install errors.

ulysses4ever commented 1 week ago

Well, the issue doesn't request to create the error-index part for Cabal, it only asks to "integrate" with the error index. And following its own definition of "integration", I think, the work is done.

As for populating the error index, I'm not sure what good it will do to leave this open...

But I'm not against leaving it open either. Just wanted to understand the current purpose of it.

haskell / cabal

Integration with the error message index? #8543