lark-parser / Lark.js

Live port of Lark's standalone parser to Javascript
MIT License
71 stars 12 forks source link

Parser doesn't provide next expected token type according to the grammar #14

Closed jillyj closed 2 years ago

jillyj commented 2 years ago

Grammar file: https://github.com/opencybersecurityalliance/kestrel-lang/blob/develop/src/kestrel/syntax/kestrel.lark

When I tried to parse proc2 = GET incomplete statement, the parser does not provide the expected token type for next token when I caught the exception UnexpectedToken.

image

Expected result: Based on the grammar

statement: VARIABLE "=" command
         | command

// "?" at the beginning will inline command
?command: get
        | find
        | disp
        | info
        | apply
        | join
        | sort
        | group
        | load
        | save
        | new
        | merge
get: "get"i ENTITY_TYPE ("from"i DATASRC)? "where"i STIXPATTERNBODY (starttime endtime)?

it should provide the next expected token type ENTITY_TYPE, so we can do something afterwards.

erezsh commented 2 years ago

Strange, it works when I try it.

image

Can you provide a minimal script that reproduces that behavior?

erezsh commented 2 years ago

I found a bug related to the expected values. Please try again with version 0.1.1 of lark-js.

jillyj commented 2 years ago

I tried 0.1.1 but still could not get the expected values.

jillyj commented 2 years ago

I am sure I used the parser.js generated from 0.1.1 version, and still got empty expected value.

image

Looks like the exception you got is different from mine. The exception I got is UnexpectedToken. Full stack is like below.

    at ParserState.feed_token (webpack-internal:///./src/parser/kestrel_parser.js:4435:17)
    at _Parser.parse_from_state (webpack-internal:///./src/parser/kestrel_parser.js:4537:22)
    at _Parser.parse (webpack-internal:///./src/parser/kestrel_parser.js:4512:19)
    at LALR_Parser.parse (webpack-internal:///./src/parser/kestrel_parser.js:4300:28)
    at ParsingFrontend.parse (webpack-internal:///./src/parser/kestrel_parser.js:5168:26)
    at Lark.parse (webpack-internal:///./src/parser/kestrel_parser.js:5676:26)
    at App (webpack-internal:///./src/App.js:196:23)
    at renderWithHooks (webpack-internal:///./node_modules/react-dom/cjs/react-dom.development.js:14803:18)
    at updateFunctionComponent (webpack-internal:///./node_modules/react-dom/cjs/react-dom.development.js:17034:20)
    at beginWork (webpack-internal:///./node_modules/react-dom/cjs/react-dom.development.js:18610:16)
jillyj commented 2 years ago

The steps to recreate this issue:

  1. Generate kestrel_parser.js using command
    lark-js kestrel.lark -o kestrel_parser.js --keep_all_tokens
  2. Importkestrel_parser.js into the JS project.
  3. Get parser by
    const parser = get_parser({keep_all_tokens: true});
  4. Try to parse this statement proc2 = GET.
jillyj commented 2 years ago

Attached my generated kestrel_parser.js for your debugging. kestrel_parser.js.zip

erezsh commented 2 years ago

Thanks for spotting this! Turns out there's a bug in the isupper() function.

Please confirm that changing it to

function isupper(a) {
  return /^[A-Z_$]*$/.test(a);
}

Fixes the problem.

jillyj commented 2 years ago

Yes, that change fixed the problem! Looking forward to the new release. Thank you!

jillyj commented 2 years ago

@erezsh Would you please release a new version for this bug? Thanks!

erezsh commented 2 years ago

Released.