Open mike-lischke opened 7 years ago
Due to strict null checking in TypeScript, it's not practical for this target to exactly match the other targets. Doing so causes an overwhelming burden on developers working with things like listeners and visitors for cases where the parse tree is known to not contain syntax errors.
The solution for this target is the introduction of try*
methods to match the other targets, such as the following pair in TokenStream
:
LT(k: number): Token;
tryLT(k: number): Token | undefined;
The equivalent for the getToken
method you mentioned would be tryGetToken
, but that currently doesn't exist. As a workaround you can use getTokens
and check the length of the result before getting the token at the expected index. The same applies to getChild
, where the alternative is checking getChildCount
or calling getRuleContexts
depending on which overload you were using.
:bulb: If you encounter a case where you expected a try*
method but it doesn't currently exist, file an issue (preferably one issue for each needed try*
method) so we can add it.
Hmm, I don't get the idea of the tryXX methods. If getToken would just return a null instead of throwing everything would be fine (in fact I modified the local antlr4ts code like that and it works nicely). Other targets also do strict type checking and still don't need a tryXX function. That sounds like big burden for me, actually.
And btw, it's a listener where I have this problem (constructing a symbol table). I check for certain tokens and act depending on whether they have a value or not:
export class DetailsListener implements ANTLRv4ParserListener {
public tokenVocab: string;
constructor(private symbolTable: SymbolTable, private imports: string[]) {}
exitLexerRuleSpec(ctx: LexerRuleSpecContext) {
if (ctx.TOKEN_REF() != null) {
let symbol: string = ctx.TOKEN_REF().getText();
if (ctx.FRAGMENT() != null) {
this.symbolTable.addSymbol(SymbolKind.FragmentLexerToken, symbol, ctx);
} else {
this.symbolTable.addSymbol(SymbolKind.LexerToken, symbol, ctx);
}
}
}
...
The line ctx.TOKEN_REF() != null throws the exception, which is wrong IMO.
@mike-lischke getToken
returns Token
. If you want it to return undefined
, the method signature needs to change to returning Token | undefined
. This means you can never use the result of getToken
without first checking if the result is actually a Token
, even in cases where you know you supplied a valid index. Rather than require null checks everywhere, we chose to instead change the semantics so the result is non-null, with alternate methods provided where they make sense.
Other targets also do strict type checking and still don't need a tryXX function.
TypeScript distinguishes undefined
and null
from other types in the type system. No other supported target language has a feature even resembling this. In Java for example, the Token
type is equivalent to "Token
or null", and the compiler will allow you to omit null checks without a problem. If you omit the null check from equivalent TypeScript code, it's a compile-time error.
Yes, I got the point about the null check, but this is how we do it in every other target. Why does the TS target go a different route (without real need, I'd say)? It's not only about syntax errors! It can easily happen that you cannot find a specific token if it is optional (and not given in the input). Those tryXX functions just make things more complicated, really. Why is it a problem to check for null (or undefined)? You would have to do this anyway with a tryGetToken function, just that you have yet another function for the same task. If you are 100% sure you always get a valid token then just omit the null check. Where's the problem?
I think the direction I want to see it go is to expand on the current analysis for the number of times a particular element can appear in an alternative. Currently we distinguish between items that can appear at most once, and those that can appear more than once. What we need is to further distinguish between those that are exactly once and those that are optional.
Appearances | Accessor current return type | Desired return type |
---|---|---|
T? |
T |
T \| undefined |
T |
T |
T |
T* |
T[] |
T[] |
T+ |
T[] |
T[] |
Hmm, yes, I agree. As we see you would use the same return type for T*
and T+
which are similar like T?
and T
, respectively, which makes me think (also for keeping the class API small) that a common T | undefined
is the way to go.
@mike-lischke I filed #265 for the primary issue of optional elements having accessors that return required values. If you have other specific requests please feel free to create additional issues for them.
I have to reopen this issue again, because the provided solution doesn't work. In case of syntax errors I have no way of checking if a token exists, because even testing for it throws an exception. Is this really what you consider the simpler solution? Here's an example:
visitParserRuleSpec = function (ctx: ParserRuleSpecContext): string {
if (!ctx.ruleBlock()) {
return "# Syntax Error #";
}
let diagram = "ComplexDiagram(" + this.visitRuleAltList(ctx.ruleBlock().ruleAltList()) + ").addTo()";
this.scripts.set(ctx.RULE_REF().text, diagram);
return diagram;
}
The call to !ctx.ruleBlock()
throws already. How am I supposed to check if that token exists? Wrap everything with a try/catch?
I dropped the ball on this. Issue #128 was intended to address this, but the tryXxx() approach never seemed completely satisfactory, and I could not find a typescript best practice recommended.
Let's go back to the having getToken()
etc. return a nullable, as Mike suggests, and provide a generic unNull( x : T | null ) : T
function which can throw on null to simplify consuming code where full flexibility isn't needed. It is a breaking change, but strict null checking should quickly identify any code that needs to be changed.
@BurtHarris This issue only arises when processing trees that contain syntax errors. In cases where users are working with these trees, the tryXxx()
approach seems reasonable. Switching back places the burden instead on users working with trees that don't have syntax errors, which is a bad experience overall.
@sharwell OK yes, looking back at all the places getChild()
is used, I remember the strength of that motivation. I see ParserRuleContext
already has a tryGetChild()
and tryGetRuleContext()
as expected from the list in #128, but RuleNode.tryGetChild()
is still missing.
What I don't understand is Mike's example using ctx.ruleBlock()...
, I can't find ruleBlock anywhere in antlr4ts. Is he talking about some codegen produced method?
@mike-lischke Oh, I see, your ruleBlock
example comes from the grammar in antlr4-graps, is that right?
I still don't fully understand that, but please have a look at the PR above and let me know if its an approach you can live with, or if there are more methods like getChild that need a tryGetChild in the runtime.
P.S. An alternative I considered to tryXxx() some time back was to add an optional throws: Boolean
parameter in an overload to methods like getChild, something like this:
getChild(i: number): ParseTree;
getChild(i: number, throws: Boolean): ParseTree | undefined;
I'd be interested in your feedback, do you like that any better?
ParserRuleContext.getChild()
it already has an overload that takes one optional parameter ctxType
, which actually controls the return type of the method. How to order the two optional parameters wasn't exactly clear to me. While I wrote this code, I really don't have enough experience actually using ANTLR to fully predict usefulness in scenarios involving syntax error recovery, etc.
tryGetRuleContext()
is not a nice replacement for normal rule context retrieval, because it requires the user to lookup the index of the context manually (which is otherwise done on code generation). I understand that because of the strict null check, when allowing undefined
on rule getters, an application would either have to force not null (via the !
operator) or check if the returned value is defined, which is annoying when you know it must be correct.
But on the other hand it's annoying to use tryXXX methods in all other cases. In fact, to be on the safe side you would then always use tryXXX instead of the normal getters, unless you can 100% be sure your input is always correct (which I believe is an illusion). But when you use tryXX anyway, why having the normal context retrieval functions at all?
IMO the lesser evil is that (in a strict null check environment) you have to always check for undefined results, even if you believe you always get only correct input. Every other ANTLR target has to do the same checks as well. Why is it unacceptable for a TS target user to do the null checks too? What I try to avoid is also that antlr4ts behaves differently than other targets, without a real benefit. Of course YMMV, but I like to be able to easily port examples written for a different target easily to the typescript target (and in fact I frequently write C++ implementations first and then port them to typescript).
@mike-lischke If your parse operation had syntax errors, don't run the visitor at all. For most user scenarios, most visitors are not only fine to run like that, but are actually cleaner and produce a more consistent user experience. When implementing code completion, it's more work to handle errors, but it's not a problem many users will ever need to be involved with.
unless you can 100% be sure your input is always correct
Outside of bugs in ANTLR, we are. All you need is this:
if (parser.numberOfSyntaxErrors !== 0) {
// Do not run visitor
return;
}
tryGetRuleContext()
is not a nice replacement for normal rule context retrieval, because it requires the user to lookup the index of the context manually
I'm not seeing any place where you need to know an index that you wouldn't otherwise need to know the index when working with the generated code. Do you have an example?
If your parse operation had syntax errors, don't run the visitor at all.
That's not an option. In order to have a passably useful symbol table you have to visit the produced parse tree in any case. Remember the syntax error could also be after the caret position, so everything before that is valid and needs to be considered (e.g. for code completion).
I'm not seeing any place where you need to know an index that you wouldn't otherwise need to know the index when working with the generated code. Do you have an example?
Sure, look at this generated code for an ANTLR4 parser. It calls this.getRuleContext(0, RuleBlockContext);
. The index 0 is defined by the code generator. Now look at this visitor code. Here I have to use ctx.tryGetRuleContext(0, RuleBlockContext)
(there is no function ctx.tryRuleBlock()
). In order to use tryGetRuleContext
I have to find out which index to use for the call (not to mention that if (!ctx.tryGetRuleContext(0, RuleBlockContext)) {
is less attractive than just if (!ctx.ruleBlock()) {
.
OK @sharwell, thinking over @mike-lischke's generated-code example has me convinced that sometimes it's better to lie to the type system.
The TypeScript declaration of Array<T>
is prototypical example for JavaScript: [n]
is not declared to return T | undefined
, even though in reality an out-of-bounds index will return an undefined
(no need to throw an exception unless you try to use the value returned.)
By extension I suggest the generated code wrappers for syntactically-required parts of the parse tree should lie, we don't want tryXxx
wrapper methods for them. The logical way to do so is declare them : T
, even though : T | undefined
is more accurate. They don't need to throw because unwary code will throw on it's own. Error aware code should verify truthy-ness of even apparently required syntax elements.
Carried to it's logical conclusion, I think this eliminates the justification for all tryGetXxx(n)
methods, with the Array<T>.find()
method being a logical replacement for the two argument getXxx(n, ctx)
and tryGetXxx(n, ctx)
overloads, which is something I would celebrate.
That's not an option. In order to have a passably useful symbol table you have to visit the produced parse tree in any case. Remember the syntax error could also be after the caret position, so everything before that is valid and needs to be considered (e.g. for code completion).
Code completion is overall a minority case. And we're talking a minute fraction case here - maybe dozens of developers compared to many thousands. It may be the majority case for what you are working on, but I do not find such a limited case to be substantially motivating to detract from the type safety being provided for the majority case. You will need to use the try*
methods in many cases, but most ANTLR-based applications will not need them at all.
The index 0 is defined by the code generator.
The index is always 0.
OK @sharwell, thinking over @mike-lischke's generated-code example has me convinced that sometimes it's better to lie to the type system.
I agree that Mike's code could be simplified with an alternate approach. However, as a minority case overall I do not believe it outweighs the benefits the current implementation provide for the majority case.
@BurtHarris. I fully support your thoughts here, however how can you return undefined
when you haven't declared a return value of T | undefined
? I just tried. bit TS won't let your return undefined
in that case.
@sharwell, How's not throwing an exception corrupting the type safety in antlr4ts? You also return undefined
for optional elements, without having trouble with type safety. I agree you shouldn't optimize for the special cases, but always for the general case (majority use), though I don't consider this here as special case (ignore the connection to code completion for a moment). It's s a general problem that antlr4ts uses a different approach than any other ANTLR4 target, for no good reason.
How's not throwing an exception corrupting the type safety in antlr4ts?
Either the return value is allowed to be undefined
, or it is not. There are three options when you cannot statically verify the inputs to a method producing such a value:
undefined
even though it claimed not toYou also return
undefined
for optional elements, without having trouble with type safety.
Optional elements use a different return type in the generated code than required elements use. Unlike other targets, this target differentiates between optional and required elements in the generated code.
It's s a general problem that antlr4ts uses a different approach than any other ANTLR4 target
This is the only target+language combination capable of expressing this concept. It is equivalent to the other targets within the bounds of the expressiveness of the languages.
for no good reason
I worked on this target for the primary purpose of better understanding the capabilities/expressiveness of the TypeScript type system and the ways it relates to real-world applications. As such, succinctness of code interacting with generated parsers and certain conceptual differences relative to other language targets are currently secondary concerns for this particular target. My goal is to produce optimally usable code within the bounds of adhering to the limitations of the type system.
OK, I see we obviously have different goals and we should stop endlessly arguing. I can just fork antlr4ts and change that to what I need/want, no problem. Thanks.
@mike-lischke If our goals would be considered competing with one another, that would be a pretty substantial negative mark on the type system for the language. I am still interested in addressing the usability concerns that remain. There are two specific cases where things remain potentially clumsy for error handling scenarios. In each of these cases, the alternative to handle errors is quite a bit longer than the short form, but not difficult to determine (e.g. no need to look up indexes in the generated code).
TContext
(single, non-optional). The equivalent code to get this as an optional value to handle errors is ctx.tryGetRuleContext(0, TContext)
, which is longer but trivial to determine from ctx.t()
.ctx.t(i)
, but if you need to handle error scenarios it would be ctx.tryGetRuleContext(i, TContext)
.Assuming that we cannot change ctx.t()
or ctx.t(i)
, what would you like to write instead of the long form as you are working?
@sharwell: This is really a matter of conflicting style not conflicting goals. Its unfortunate that Mike felt blocked, but I feel accountable for not working through this earlier. But in view of ts-flavor, the assumption above about not changing ctx.t()
or ctx.t(i)
tastes bad to me.
It all goes back to issue #8. We began the job with PR #266, #268, but #8 is really not complete. PR #270 didn't extend the pattern of using properties rather than methods. I think client code should be using property ctx.t
rather any method(s). The idea of using different methods for syntax-safe and syntax-unsafe situations seems really distasteful to me. Instead I think we can have different declarations of identically named properties on tool generated interfaces appropriate for to those two different situations. That way code can use conventional JavaScript shortcuts like if (ctx.t) ...
or (ctx.t || {}).foo
in either case, but is only forced to when needed. Since TypeScript 2, ctx.t!.foo
can be used to cheat too.
For discussion purposes lets call these ISafeand
IUnsafe
grammar Json;
pair
: name=JString ':' value
;
obj
: '{' pair ( ',' pair )* '}'
| '{' '}'
;
...
Isn't this close to what we want to be generating...
export interface IUnsafePair {
name: Token | undefined;
value: IUnsafeValue | undefined;
}
export interface ISafePair extends IUnsafePair {
name: Token;
value: ISafeValue;
}
// potentially non-exported implementation details...
class PairContext implements IUnsafePair, ISafePair {
name: Token;
_value?: IUnsafeValue;
get value() { return this._value!}
}
export interface IUnsafeObj {
pair: Array<IUnsafePair>;
}
export interface ISafeObj extends IUnsafeObj {
pair: Array<ISafePair>;
}
How does ErrorNode
enter into this?
@mike-lischke I understand re forking, and wanted to point out that the npm link
and bundle features may help make this less painful and more easily reversible. Personally, I'd like to see your fork. I've already used link
in the build scripts to deal with an ugly build-order issue and enable testing on a fork, but not used bundle
(yet). Your work is advanced enough it might justify looking into it.
No worries, I'm just tired to discuss this issue.
@mike-lischke @BurtHarris I filed antlr/antlr4#1972 to really truly fix this the right way. Once the new error recovery strategy is implemented, 1) the tryGet*
methods can all go away, 2) the accessors can be used in visitors for both correct and incorrect code, and 3) the methods can continue throwing exceptions when used incorrectly.
This will be a milestone achievement for ANTLR's error recovery IMO, and likely never would have been possible without all the opposing pressure on both sides in the discussion above.
The generated Javascript behaves differently in various places than other targets (including Java and Javascript). One example is
ParserRuleContext.getToken
. The generated code throws exceptions while it should simply return null. There are other places likegetChild
(and perhaps even other classes) where this happens. It is very important that the behavior of the generated JS code is the same like in any other target.