no-context / moo

Optimised tokenizer/lexer generator! 🐄 Uses /y for performance. Moo.
BSD 3-Clause "New" or "Revised" License
814 stars 65 forks source link

Save & Resume and Correct Position #182

Open elbakerino opened 1 year ago

elbakerino commented 1 year ago

I've choosen MOO to make my first steps with own DSLs, as that isn't my focus area it could be that the "wish" is simply bad practice.

Parsed languages are Standard SQL and custom Modeling-Languages

In relation to: #142, #89, #12

In my e.g. SQL are non-standard placeholders, which are resolved while parsing the token, the placeholder may-reference to another SQL-code, which is then parsed and injected to the AST where the placeholder is.

I'm using a custom parser to produce the AST, implementing a shallow visitor pattern over Token.

My current problem comes from save/reset and the (looks like) impossibility to simply "resume where it was".

When using the following logic, it goes into an endless loop:

const text = 'the-code with reference'

// ... iterating based on `` in the parser
const saved = // POS-A

// do some other lexer.reset('partial-code');;

// resume from `POS-A`
lexer.reset(text, saved)

So from #89 i've used the slice strategy and wrapped the to get it from all parsed tokens to build an index-index like said in #142 - leading to "I don't use save at all":

export class ModelLangParser<N extends DslNodeBase> {
    protected readonly lexer: Lexer
    protected readonly visitors: DslVisitors<N> | undefined = undefined
    protected readonly saved: [string, LexerState][] = []
    protected readonly text: string
    protected position: number = 0

        lexer: Lexer,
        visitors: (parser: ModelLangParser<N>) => DslVisitors<N>,
        text: string,
    ) {
        this.lexer = lexer
        this.visitors = visitors(this)
        this.text = text

    save() {
        // failed experiment, if it could be built correctly with `save` 
        const saved: [string, LexerState] = [this.text,]

    resume() {

        /* const toResume = this.saved.pop()
        if(typeof toResume === 'undefined') {
            throw new Error('can not resume, nothing was saved')
        this.lexer.reset(toResume[0], toResume[1]) */

    parse(parent: N): N {
        this.position = 0

        // here is the parser... (omitted for readability)
        // do { parsing } while(typeof next !== 'undefined')

        return parent

    protected lexNext() {
        const next =
        if(next) {
            this.position += next.text.length
        return next

E.g. usage:

const visitorNamed: (parser: ModelLangParser) => DslVisitor = (parser) => (parent, token) => {
    // ... omitted registering new `Node`

    // // not necessary with "position"

    // resolving some othe code by e.g. ID, maybe starting another parser from within
    const nestedAst = SomeLogic.resolveAndParse(token.value)

    // ... omitted injecting `nestedAst` as `children` to current `Node`

    return Next.close()

Now the resume works nicely from inside the visitors but destroys all "meta" info, like correct line number / column etc.

Is the endless-loop thing some wrong implementation? Can resume be built with save/reset (together with a correct line, col. number)?

If the own index-index is required for such a scenario, i would additionally need to keep track of lines and columns myself to rebuild the Token the visitor receives.

Yes it can be implemented in userland, but imho this may be easier and cleaner to implement with some more help from "inside moo".

Knowledge Base

maybe i've got some basic understanding wrong

I've understood that...