uniform-team / Uniform-Validation-Language

A logic language to simplify and improve online form validation.
Apache License 2.0
1 stars 0 forks source link

Refactor Lexer #61

Closed dgp1130 closed 7 years ago

dgp1130 commented 7 years ago

src/lexer.js is easily the weakest part of our codebase. It is built with a very imperative architecture with lots of state manipulation and is difficult to reason about and work on. I would like to make this work in a more functional manner, by creating a Stream object exposing several methods such as match(), consume(), consumeUntil(), and returnToken() which can be chained into a definition for a lexical structure. This would look something like:

function createTokenizer(ufmCode) {
    const stream = new Stream(ufmCode);
    return function () {
        return stream.match(/[0-9]/, // When a digit is matched
            () => stream.consume(/* first digit */).consumeUntil(/[^0-9]/ /* not a digit */)
                .returnToken(new Token(stream.consumedChars, constants.TYPE.NUMBER))
        ).match(/[a-zA-Z_]/, // When a letter or underscore is matched
            () => stream.consume(/* first letter */).consumeUntil(/[^a-zA-Z_]/ /* not a letter */)
                .returnToken(new Token(stream.consumeChars, constants.TYPE.IDENTIFIER))
        );
    };
}

const tokenizer = createTokenizer("1234test4321");
tokenizer(); // new Token("1234", constants,TYPE.NUMBER);
tokenizer(); // new Token("test", constants.TYPE.IDENTIFIER);
tokenizer(); // new Token("4321", constants,TYPE.NUMBER);
dgp1130 commented 7 years ago

Implemented an interface similar to the one above. I'm not completely convinced that this is better than the old solution. The previous system was hard to follow, but it was clear what every line of code did.

This functional pattern is more novel and abstracted certainly. I like that the mechanics of the Stream are in a separate class which knows nothing of the Uniform language, while a subclass of it defines the lexical structure for Uniform itself. The logic of how Uniform is tokenized is clear while the implementation of it is handled by the Stream class.

Downside is that it certainly takes a little while to really get it. This is also very likely less performant as RegExs can't easily match halfway through a string in JavaScript, so the entire source code (possibly 1000s of characters) is trimmed one character at a time as it is lexxed, creating a new string each time. It didn't cause any issues for me, and we might find a way around it in the future, but it could result in performance issues.

Also I basically looked at a 300+ line function and decided to replace it with a 65 line statement.

Sawyer, I'll let you decide if that is actually better.

sawyernovak commented 7 years ago

It appears to work, and I can follow it for now. My only issue with the really long function is the scalability. It could get out of hand in the future. We will worry about it if it gets to that point.