Closed haxscramper closed 3 years ago
Split into separate RFC from https://github.com/nim-lang/RFCs/issues/322
I know I know, nobody understands the "semi structured" aspect of XML/HTML but it is what it makes it useful for layout -- JSON isn't suitable.
nkError
now exists, providing a starting point.
how should nkError best be structured to support this RFC?
rust's error index has hundreds of entries, a single enum for errorKind might get unmaintainable/unreadable very quickly. perhaps errorKind should be subcategorized.
How can nkError avoid becoming an xkcd standard when it doesn't replace other forms of error handling (nil PNodes, localError, globalError, nkEmpty, tyError...) but just adds one more?
here's a dump of every proc that raises a localError, and the line the localError is on, organized by file. it's just thrown together with grep and awk, so its got some holes, but it's the beginnings of an overview.
the procs returning a PNode (sem* mostly) are clear candidates for nkError
several others are "fooCheck" or "expectFoo" sorts of things that look like they could be refactored
it's less clear to me how the category of procs that return a PSym,PType,or deal with the PNode within a PType could work within a more structured error-handling system. use a similar scheme for tyError? should there be an skError?
Well the way I structured it should work well.
rust's error index has hundreds of entries, a single enum for errorKind might get unmaintainable/unreadable very quickly. perhaps errorKind should be subcategorized.
I'm not worried. There is nothing unreadable about an enum with many cases or a lookup table with many cases or a case statement with many cases -- the alternatives are usually much worse where there is some arbitrary splitup just because somebody read some book about how method bodies shouldn't have more than 3 lines.
How can nkError avoid becoming an xkcd standard when it doesn't replace other forms of error handling (nil PNodes, localError, globalError, nkEmpty, tyError...) but just adds one more?
Well ... for a start the backend ideally shouldn't report errors but in practice it needs to, and the backend doesn't produce a new AST so it's stuck with localError. That's ok. nkError
should replace returning nil PNodes and nkEmpty.
how should nkError best be structured to support this RFC?
I'm not exactly sure about concrete implementation details (to be discussed (in https://github.com/nim-lang/Nim/pull/16915/files there is no additional information about error cause in TNode
)), but my original idea was to add a branch to TNodeKind
, and put some ErrorDescription
field in it, and store all relevant information there. I'm mostly concerned with supporting diagnostics mentioned in https://github.com/nim-lang/RFCs/issues/323 [1] and https://github.com/nim-lang/RFCs/issues/325
I'm still not sure why json is not considered suitable, but that is a presentation issue, so I guess anything works. I'm fine with XML, but I know some people would still prefer json over it any time. I don't think it would create any major implementation constraints, although having ErrorDescription
as a separate type would make it easier to write couple of default error formatters (and make it possible for anyone else to use compiler API and output to required format (or use error description as-is))
proc printPrettyError(err: ErrorDescription)
proc printXmlError(err: ErrorDescription)
rust's error index has hundreds of entries, a single enum for errorKind might get unmaintainable/unreadable very quickly. perhaps errorKind should be subcategorized.
If you are talking purely about implementation (i.e. how error kind should be represented in compiler) I don't think it makes much difference implementation-wise to have
type
ErrorKind* = enum ## expand as you need.
RawTypeMismatchError
ExpressionCannotBeCalled
CustomError
WrongNumberOfArguments
AmbiguousCall
vs
type
SyntaxError = object
case kind*: SyntaxErrorKind
# ..
ErrorCategory = enum
ErrSyntax
ErrTypes
ErrSomethingElse
ErrorDescription* = enum ## expand as you need.
case category*: ErrorCategory
of ErrSyntax:
syntax: SyntaxError
of ... # (you get the idea)
I would even say I'm in favor of second approach, because there really is a lot of different scenarios where error can be created, but again - it does not make much of a difference, and it might be more annoying to work with.
How can nkError avoid becoming an xkcd standard when it doesn't replace other forms of error handling (nil PNodes, localError, globalError, nkEmpty, tyError...) but just adds one more?
From what I understand @Araq wanted to largely replace all currently implemented error handling with nkError
(where possible). In addition to that, I think using nil PNodes and/or nkEmpty for error signaling is not the best idea, as they are fundamentally incapable of storing any information about error cause.
here's a dump of every proc that raises a localError, and the line the localError is on, organized by file. it's just thrown together with grep and awk, so its got some holes, but it's the beginnings of an overview.
After doing similar filtering (using grep -R localError | grep -Eohe '".*?"' | grep -v "internal error:" | uniq
) I've found ~300 different error messages that are not internal (i.e. candidates for addition to error enum), so it should give a very rough estimate for the enum size.
it's less clear to me how the category of procs that return a
PSym,PType
, or deal with the PNode within aPType
could work within a more structured error-handling system. use a similar scheme fortyError
? should there be anskError
?
I suppose adding skError
is a possibility, but it is necessary to determine how often errors are created within PSym/PType
-generatic procs and decide based on that.
For me it's pretty straight-forward:
nil
worked well enough.nkError is the obvious way to do it, it's just another AST. The macro system can already handle ASTs
No objections here. But this RFC is about presenting error information in the structured way, and what kind of information should be presented to the user (though it is mostly addressed by the https://github.com/nim-lang/RFCs/issues/323)
Render error message ASTs as JSON/XML as you see fit for the external tooling.
The question is - how do we store "error messages"? Do we structure them using some enum, or introduce a separate type and store as much information as possible there? What exactly should be added to these messages?
You can argue that a more type-safe AST with subtyping etc would have been the better design, but the current design is not without its merits.
This is not about subtyping on the AST [1], it is mostly about storing more information about errors and presenting it to users. It might look like subtyping, but to me, it is just an upgrade from error: string
to error: ErrorDescription
[1] I think current implementation is the best approach overall, until it comes to code generation, but that is another issue with its own set of requirements that has nothing to do with compiler development.
Do we structure them using some enum,
Yes, and we have the enum in errorhandling.ErrorKind
.
error: string to error: ErrorDescription
It's an upgrade to nkError(childNodes)
. The translation step is enum value --> string format specifier. The children are translated via renderTree
. Much like:
proc toAscii(e): string =
case errorKind
of ExpressionCannotBeCalled:
result = "expression '$1' cannot be called" % wrongNode.renderTree
proc toXml(e): string =
case errorKind
of ExpressionCannotBeCalled:
result = "<uncallableExpression>" & wrongNode.renderTree.escapeXml & "</uncallableExpression>"
The question is - how do we store "error messages"? Do we structure them using some enum, or introduce a separate type and store as much information as possible there? What exactly should be added to these messages?
there is a pattern (that i'd love to see more of) of "error: this went wrong got $1", and a special case of "error: this went wrong got $1, helpful suggestion?"
storing them both under the same enum, with it's corresponding format logic, and an optional help message as already implemented:
result.add wrongNode
result.add newIntNode(nkIntLit, ord(CustomError))
result.add newStrNode(msg, wrongNode.info)
So the idea is to try and encode ~300 different errors together with their description in subnodes of TNode
, correct? Basically what I got confused about in your implementation (and it is somewhat clearer now):
node[0]
of kind nkIntLit
[1.. ]
nodes in the ASTJust as an example - generating error for type mismatch would involve something like
newError(ErrTypeMismatch, candidates)
And error renderer can kick in and employ all the sophisticated ordering and placement heuristics I mentioned in https://github.com/nim-lang/RFCs/issues/325
So the idea is to try and encode ~300 different errors together with their description in subnodes of TNode, correct?
Well yes, as I wrote the code in errorhandling.nim
. However don't phrase it as "300 different errors", that's the worst case if we fail to unify some of these error messages.
And error renderer can kick in
It doesn't "kick in" as there is no callback mechanism involved. Instead error reporting is handled later in sempass2
and currently only there.
storing them both under the same enum, with it's corresponding format logic, and an optional help message as already implemented
Yeah, it's good idea.
It doesn't "kick in" as there is no callback mechanism involved. Instead error reporting is handled later in sempass2 and currently only there.
I mean we defer all pretty-printing logic until the rendering stage, not that there is some on-error callback
there is a pattern (that i'd love to see more of) of "error: this went wrong got $1", and a special case of "error: this went wrong got $1, helpful suggestion?"
It would be necessary to encode all useful context in subnodes somehow - my main concern, is that at some point we will encounter some error for which it is possible to generate some suggestion, but this suggestion cannot be easily encoded in the AST (or it would require very hackish implementation, like writing type -> AST
(de)serialization).
For example noSideEffect
annotation. When there is a side effect in the procedure implementation it would be really nice if it showed what part of the code generates a side effect instead of the current "you have an error" response. Something re-running semcheck in the top-down mode and add all offending subnodes to the generated error?
when false:
listGcUnsafety(s, onlyWarning=false, g.config)
else:
- localError(g.config, s.info, ("'$1' can have side effects" % s.name.s) & (g.config $ mutationInfo))
+ let offenders = findSideEffectedNodes(body)
+ return newError(ErrSideEfffect, offenders))
my main concern, is that at some point we will encounter some error for which it is possible to generate some suggestion, but this suggestion cannot be easily encoded in the AST
On the second thought - maybe it is not that big of a deal in the end, and trying to engineer around possibly non-existing issue is even worse.
Lexical errors have to be represented somehow. There are not a lot of them (indentation, invalid character literals, trying to use __
in identifier names and things like that). But this mostly a rewording issue (and maybe some additional check to get a better message).
Syntax errors can be represented using nkError
- parser already has access to the AST and can generate them.
User-defined errors can be represented using ErrUser
, but pretty-print and XML conversion should ideally be configurable too, somehow. Would it be possible to just put printer proc symbols in the errorNode[1]
to make them called by xml/json/pretty
-printer?
trying to engineer around...
if later you want to implement a better errormessage for X, and it needs extra info, you'll add the logic at the erroring site to pack the nkError with the right nodes, add a new ErrorKind
, and a case to errorToString that unpacks them accordingly.
Lexical errors: ref #348
So ... is this RFC settled? I think the rest is all the required implementation effort.
I'm still not sure if a proposed implementation with subnode error encoding would be sufficient, but this can be addressed if we get to the point where this becomes an issue. Otherwise, I consider this settled, yes
if later you want to implement a better errormessage for X, and it needs extra info, you'll add the logic at the erroring site to pack the nkError with the right nodes, add a new ErrorKind, and a case to errorToString that unpacks them accordingly.
currently the error msg rendering happens where the error is generated, which allows writing error messages with runtime context information, eg:
"expected $1, got $2, extra info = $3"
if you defer rendering to a later stage, rendering such messages would require adding fields to the error node (deep copying which is wasteful, or shallow copying which can cause issues if the nodes are modified), it adds significant complexity for no benefit.
Instead, we should keep the rendering as it currently is, at the place where error is generated, which is where we have all the runtime context available for writing informative error messages.
But adding an enum field/unique id which uniquely identifies the error message is a good idea, and can be done without any complex refactoring or loss of functionality. Furthermore, it can be done gradually, with a default enum value ErrorUnknown
for error messages without an attached enum value.
PType already has tyError, PSym could have skError but I don't know if it's needed, it comes up so rarely that nil worked well enough.
we already have skError
, although it's defined as skError* = skUnknown
which is an IMO un-necessary micro-optimization that doesn't help performance. See also proc errorSym*(c: PContext, n: PNode): PSym
=
Immediate pretty-printing of all newly generated errors (using the same interface as described above), and more compact error messages (just node kind + brief message) added to nkError after error has been printed?
There really is no need to drag around the fragile and potentially expensive instantiation context and all associated entries.
And if error message is generated immediately it might be possible to avoid this hack where all possible contexts should be first encoded into subnodes and then unpacked, and instead go directly for some internal error representation that can be printed in different ways? And after all context is used to generate appropriate error message it can be added to error as brief description?
"instantiated from here" information should be part of the nkError
structure. The used nodes only need the TLineInfo field and so a cheap shallow copy suffices.
I consider this settled. Tooling can look for '\31\n' to determine precisely where one message ends. Error messages are inherently full of natural language so only a semi-structured approach can work.
XML but it is what it makes it useful for layout -- JSON isn't suitable.
JSON can be directly converted to XML easily, and vice-versa.
I do not care if JSON or XML, I like JSON more, but it is not a big problem whatever way.
JSON can be directly converted to XML easily, and vice-versa.
<b>
is part of the HTML spec and not part of JSON, it's really as simple as that. These are different languages with different purposes. Yes you can translate one tree structure into the other tree structure, I understand that, I understand syntax trees, can you imagine?
We agree, ...then XML errors should be a thing 😏
Currently only
file(line, column) message start
can be reliably retrieved from compilation error - everything else is constructed based on string interpolation.Providing
json
output for compilation errors would make it easier to build additional heuristics on top of compilation errors (for example hooking up type mismatch to documentation search to provide import suggestions), simplify IDE integration, and mitigate pointless debate for changing error formatting - if someone wants to have all-colorful arrows everywhere, they will have access to all information needed.Introduce error index
Having centralized listing of all compilation errors would be helpful for beginners, makes it easier necessary to explain particular details for errors in the manual/tutorial (or worse - leaving users to figure out cause and solution themselves).
Single source of information for all compilation (and some runtime) errors would make it easier to spot bad errors, show and explain possible fixes.
related:
elm
error message catalogrust
compiler error indexIntroduce internal type for errors instead of using strings
Having intermediate representation for compilation errors will make particular output format most a rendering issue - converting to json, or pretty-printing compilation errors.
Additionall this would make writing compiler-based tools easier, as
structuredErrorHook
would still have access to all necessary information instead of opperating only oninfo: TLineInfo; msg: string; level: Severity
.