no-context / moo

Optimised tokenizer/lexer generator! 🐄 Uses /y for performance. Moo.
BSD 3-Clause "New" or "Revised" License
821 stars 65 forks source link

Question: special rule for skipping tokens - performance impact #83

Closed swiatczak closed 6 years ago

swiatczak commented 6 years ago

Hi,

I have seen that the preferable way to deal with skipping tokens is to leave such functionality out of the core library. However, I am curious what would be the performance impact of having a "special" rule that would then be used to skip tokens?

for example: with this in the rules

    _skip: [ 
        { match: /[ \t]+/}, 
        { match: /[\n]+/, lineBreaks: true}  
   ]

and this in Lexer.prototype.next

if (token.type === '_skip'){
     token = this.next()
}
return token

I apologize if this is a waste of time.

kind regards. and thank you for moo.

tjvr commented 6 years ago

thank you for moo.

You're welcome! 🎉

Other JS lexers implement skipping by means of custom functions which override how tokens are processed: they might, for example, return a list of tokens, which are then melded into the output token stream. This would definitely have some performance impact!

In terms of just plain skipping I guess it wouldn't be so bad? You'd really want to use a while loop rather than the recursive call you've written above. Because of how inlining works in JS engines, this will be just as fast if you implement it yourself, as if we implemented it in Moo… and it's more flexible, since you get full control over what kind of tokens you want to skip.

Does that help at all? I think the tl;dr is that we could implement this, but then people would start asking for more flexible skipping, and things would then get quite hairy :)

swiatczak commented 6 years ago

thank you.

clarifies a few things I did not understand :)

Kind regards,