How to get parse errors programmatically

TwitchBronBron commented 5 years ago

I'm using the Lexer and Parser classes to try and validate a brightscript file. How can I get a list of parse errors?

I'm doing the following in TypeScript:

import { Lexer, Parse } from 'brs';
let tokens = Lexer.scan(text);
let statements = Parser.parse(tokens);

Problem is, it just seems to eat any errors it encounters. For example, based on this line in Parser.ts, there should be a parse error thrown whenever you declare a sub with an as clause.

Here's a very simple brightscript file that should have at least the above mentioned parse error

sub SayHello() as string 
    print "hello world"
end sub

I don't get any exceptions thrown...the statements array contains a single entry, representing the print line. If I remove the as string, then the statements array has one entry at the top level represneting the SayHello function (as expected).

What I am hoping to get is a list of parse errors, and their line numbers (which in this case would be 1 parse error, something to the effect of

{
    error: `"'Sub' functions are always void returns, and can't have 'as' clauses""`, 
    line: 1,
    column: 14,
    file: "theFile.brs"
}

Is this functionality available? If not, do you think it's do-able?

sjbarag commented 5 years ago

Hiya @TwitchBronBron ! Thanks for the detailed description – I really appreciate it!

There's currently no ability to get the list of lexer or parser errors. Not for any technical reason though – I just hadn't thought of it! The way I see it, there's two options:

1. An event emitter

The root-export brs acts like an event emitter. It'd emit an error event with a payload equivalent to what you listed above. This, coincidentally, would solve another problem I've been struggling with: how to effectively decouple output to stderr from the runtime!

Similar options include the lexer and parser individually acting like event emitters (handy if you care about errors only from one source and not the other) or exposing a brs.events property as an emitter.

2. A simple getter method

The root-export brs adds a getErrors() method that returns a list of errors. Nice and simple, though I'll need to expose a brs.clearErrors() to keep things from accumulating indefinitely.

I'm leaning pretty heavily towards something event-emitter-based. You're working on a VSCode extension, right? I'm not sure which option would work better, or if an event-emitter would even be possible within the confines of what VSCode gives you. Got any preferences?

TwitchBronBron commented 5 years ago

Before I talk about 1 or 2, I'd like to recommend a third option?

3. Use classes that each have their own state.

Here's how I'm planning on writing my language server. On startup, I have to lex and parse every file in the "project". As such, I'd like to do that as quickly as possible, so running multiple operations each in their own async context allows me to do CPU-bound operations for file1 while file2 is still loading on a node background IO thread.

So assume for a moment that your lexer and parser were implemented as classes and consider this example:

import { Lexer, Parse } from 'brs';

async function processFiles(filePaths: string[]) {

    let errorsByFile = <{ [filePath: string]: string[] }>{};

    //run each file path async, so we can maximize throughput 
    //by processing one file while another is waiting on IO, etc...
    await Promise.all(filePaths.map(async (filePath) => {
        //await on IO-bound operation (allows another file to 
        //do CPU-bound work while we're waiting)
        let text = await getFileAsTextSomehow(filePath);

        //make a new instance of a Lexer
        var lexer = new Lexer();
        let tokens = lexer.scan(text);
        //the lexer instance stores all errors about its file on its own `errors` property
        let lexErrors = lexer.errors;

        //make a new instance of a Parser
        let parser = new Parser();
        let statements = parser.parse(tokens);
        //the parser instance stores all errors about its file on its own `errors` property
        let parseErrors = parser.errors;

        errorsByFile[filePath] = [...lexErrors, ...parseErrors];
    }));

    //do something with any file that has errors
}

Option1 and option2 would work as well, but they would mean a little extra thinking to make sure to grab errors immediately after doing the lexing/parsing, and before going into another async context. Something like this might cause unintended side-effects.

await Promise.all(filePaths.map(async (filePath) => {
    let text = await getFileAsTextSomehow(filePath);

    let tokens = Lexer.scan(text);

    //await here to load a config from disc
    let appConfig = await getAppConfig();
    if (appConfig.showErrors) {
        //these will probably be some other file's errors because we awaited
        let lexErrors = brs.getErrors();
    }
}));

You can convert to classes and still support backwards compatibility by making your static methods instantiate the classes behind the scenes. Something like this:

//Lexer.ts
export function scan(text){
    var lexer = new Lexer();
    var tokens = lexer.scan(text);
    return tokens;
}

Here are my thoughts about your other items:

1. An event emitter

This would be fine as long as listeners have a way of knowing that all events have been received. Since your parser and lexer are synchronous, I would assume the event emitter would be emitting events synchronously as well, meaning something like this would work. If that's the case, then I'm ok with this approach.

let lexErrors = [];
Lexer.on('error', (err) => {
    lexErrors.push(err);
});
let tokens = Lexer.Scan(text);
//since the event emitter methods are called synchronously, 
//we know that all lex errors have been received at this point

A potential downside to this approach is performance. Calling a function on every error could add up if there are a lot of errors and/or many files. With option2 and option3, there would only be one method call at the end of the whole process (brs.getErrors() or lexer.getErrors(). Since your lexer and parser run synchronously, there isn't much benefit in getting errors immediately when they are encountered, because in most cases, the main thread is blocked until you're done parsing anyway.

2. A simple getter method

This would be acceptable also, and since the main entry points in your app are all static, this lines up with that same approach.

Since you are adding the errors as new functionality, there's no expectation that the errors array should exist across multiple lex/parse calls. So you could easily document that errors are cleared on every lex or parse call, so if you want errors, grab a copy before you run another operation.

So something like this would work for auto-cleanup.

//Lexer.ts
import * as brs from 'brs';
export function scan(text: string){
    brs.clearErrors();
    return getTokensSomehow();
}
//Parser.ts
import * as brs from 'brs';
export function parse(text: string){
    brs.clearErrors();
    return getParserResultsSomehow();
}

Consumers of your API would do something like this:

//MyApp.ts
import * as brs from 'brs';
var errors = [];
let tokens = Lexer.scan(someText);
//capture errors before another scan
errors = [...errors, ...brs.getErrors()];

let moreTokens = Lexer.scan(otherText);
//add the new errors to our list of existing errors
errors = [...errors, brs.getErrors()];

Ultimately it's your call. Any of these approaches will work fine within the VSCode extension. :D

TwitchBronBron commented 5 years ago

I just thought of another option. You probably don't want to break backwards compatibility, so you would need to expose a separate method for this:

4. Return a results object containing more information

//Lexer.ts
export function getScanResults(text: string){
    return {
        tokens: getTokensSomehow(text),
        errors: getErrorsSomehow()
    }
}

So then as a consumer of this, I would call your new method, which gets me everything I need

//MyApp.ts
let result = Lexer.getScanResults(fileContents);
let tokens = result.tokens;
let errors = result.errors;

I don't know how you would want to implement this in your library, but from the outside, it's much cleaner and you get everything you need in one place. This also allows you to add more result-like things to the return object in the future. I still like option 3 the best (class instances), but figured I would share this as another option.

sjbarag commented 5 years ago

Thanks for the ideas dude! I'd briefly thought of something like 4 but shied away from it because of the amount of changes internally that'd be required. But you're right – it's probably a good idea to have things packaged up that way!

Splitting errors to be tracked by the lexer and parser separately definitely makes sense as well. I'll see if I get time to hack around on this over the weekend and see what feels right.

TwitchBronBron commented 5 years ago

Cool. Let me know if you want to talk through anything else, or if there's something I can do to help.

sjbarag / brs