lark-parser / Lark.js

Live port of Lark's standalone parser to Javascript
MIT License
71 stars 12 forks source link

Keywords are not in the parsed tree #15

Closed jillyj closed 2 years ago

jillyj commented 2 years ago

Grammar file: https://github.com/opencybersecurityalliance/kestrel-lang/blob/develop/src/kestrel/syntax/kestrel.lark Based on the grammar

statement: VARIABLE "=" command
         | command

// "?" at the beginning will inline command
?command: get
        | find
        | disp
        | info
        | apply
        | join
        | sort
        | group
        | load
        | save
        | new
        | merge
get: "get"i ENTITY_TYPE ("from"i DATASRC)? "where"i STIXPATTERNBODY (starttime endtime)?

The from and where are keywords for get command. However, when I parsed this statement "procs2 = GET process FROM procs WHERE [process:pid = 10578]" The parsed tree is as below which does not contain the from and where keywords. Is there any options I can pass to the parser to get those keywords in the parse tree?

{
  "data": "procs2 = GET process FROM procs WHERE [process:pid = 10578]",
  "children": [
    {
      "type": "VARIABLE",
      "start_pos": 0,
      "value": "procs2",
      "line": 1,
      "column": 1,
      "end_line": 1,
      "end_column": 7,
      "end_pos": 6
    },
    {
      "data": "get",
      "children": [
        {
          "type": "ENTITY_TYPE",
          "start_pos": 13,
          "value": "process",
          "line": 1,
          "column": 14,
          "end_line": 1,
          "end_column": 21,
          "end_pos": 20
        },
        {
          "type": "DATASRC",
          "start_pos": 26,
          "value": "procs",
          "line": 1,
          "column": 27,
          "end_line": 1,
          "end_column": 32,
          "end_pos": 31
        },
        {
          "type": "STIXPATTERNBODY",
          "start_pos": 38,
          "value": "[process:pid = 10578]",
          "line": 1,
          "column": 39,
          "end_line": 1,
          "end_column": 60,
          "end_pos": 59
        }
      ],
      "_meta": {
        "empty": false,
        "line": 1,
        "column": 10,
        "start_pos": 9,
        "container_line": 1,
        "container_column": 10,
        "end_line": 1,
        "end_column": 60,
        "end_pos": 59,
        "container_end_line": 1,
        "container_end_column": 60
      }
    }
  ],
  "_meta": {
    "empty": false,
    "line": 1,
    "column": 1,
    "start_pos": 0,
    "container_line": 1,
    "container_column": 1,
    "end_line": 1,
    "end_column": 60,
    "end_pos": 59,
    "container_end_line": 1,
    "container_end_column": 60
  },
  "type": "statement"
}
erezsh commented 2 years ago

Yes, you can disable that feature with get_parser({keep_all_tokens: true})

jillyj commented 2 years ago

I've enabled the option like below

const parser = get_parser({keep_all_tokens: true});

But I didn't get the keywords presented.

jillyj commented 2 years ago

Oh, I got it. I have to set the option when I generated the parser using lark-js package like below

lark-js kestrel.lark -o kestrel_parser.js --keep_all_tokens

instead of passing the argument in the get_parser function. Thanks.

erezsh commented 2 years ago

Huh, I guess that's right. Sorry, my bad.