no-context / moo

Optimised tokenizer/lexer generator! šŸ„ Uses /y for performance. Moo.
BSD 3-Clause "New" or "Revised" License
817 stars 65 forks source link

Handle parsing errors in moo.states() #91

Open moranje opened 5 years ago

moranje commented 5 years ago

Hi,

I'm looking for a way to get the offending in a moo states object. The following doesn't seem to work and a regular error is still thrown:

  moo.states({
    main: {
      // throws the error instead of tokenizing it
      myError: moo.error
    },
    // This throws a moo configuration erro
    myError: moo.error,
  });

What would be the correct way to get the error or offending token in a stateful lexer?

tjvr commented 5 years ago

The following works fine for me:

  moo.states({
    main: {
      // throws the error instead of tokenizing it
      myError: moo.error
    },
  });

I don't think it would make sense to allow configuring an error at the toplevel? Tokens must always be defined inside a state.

nathan commented 5 years ago

The following works fine for me:

That only works for me if I have another token type in the list; otherwise it generates the regex /(?:)/my and then fails when it can't find the group that matched. If there are no tokens that match anything, instead of generating /(?:)/ (an irrefutable match), we should generate /(?!)/ (an impossible match).

I don't think it would make sense to allow configuring an error at the toplevel?

I think it might. Usually lexer states are opaque to the parser and it just sees a stream of tokens, so you very rarely want a) only certain states to have error tokens or b) different states to have different names for the error token. But I don't think the syntax @moranje provided makes senseā€”if we go this route, we should probably have a more general notion of state inheritance and/or a special state from which other states automatically inherit; then a global error token would be as simple as a { myError: moo.error } prototype state.

moranje commented 5 years ago

I think it might. Usually lexer states are opaque to the parser and it just sees a stream of tokens, so you very rarely want a) only certain states to have error tokens or b) different states to have different names for the error token. But I don't think the syntax @moranje provided makes senseā€”if we go this route, we should probably have a more general notion of state inheritance and/or a special state from which other states automatically inherit; then a global error token would be as simple as a { myError: moo.error } prototype state.

I agree on both accounts. Since a parsing error a 'global' failure it would make more sense to handle that in a single location rather than redoing it over and over again. Preferably there would a way itself to having access to the offset, col and line parameters of the offending token. That and the syntax above in nonsensical.

nathan commented 5 years ago

@moranje

Preferably there would a way itself to having access to the offset, col and line parameters of the offending token.

The moo.error notation already gives you that information:

const moo = require('moo')

const lexer = moo.states({
  main: {
    id: /\w+/,
    err: moo.error,
  },
})

lexer.reset('hello!')
lexer.next() // { type: 'id', value: 'hello', text: 'hello', offset: 0, lineBreaks: 0, line: 1, col: 1 }
lexer.next() // { type: 'err', value: '!', text: '!', offset: 5, lineBreaks: 0, line: 1, col: 6 }
moranje commented 5 years ago

The moo.error notation already gives you that information

Thanks! Here's an update to the README to represent that #95.

tjvr commented 5 years ago

Nathan added support for including states in other states, and support for $all, in #93.

It still needs documentation and some tests :slightly_smiling_face:

tjvr commented 5 years ago

@moranje If you're interested in trying out the latest master and seeing how it works for you, that would be really useful feedback! :blush:

moranje commented 5 years ago

Great! I have limited time to spare at the moment, but am excited to try out these additions. I'll try to implement the changes somewhere this week. I'll get back to you on this, great work!