Context objects during parsing

kach / nearley

📜🔜🌲 Simple, fast, powerful parser toolkit for JavaScript.

https://nearley.js.org

MIT License

3.63k stars 231 forks source link

Context objects during parsing #326

Closed dashie closed 7 years ago

dashie commented 7 years ago

I'm using nearley to parse VERY LARGE files (GB). So I don't want to construct the tree, I just want to parse and chunk by chunk eval my data and import them.

What is the best way to approach this?
How can I use custom context object/callback during parsing so that I can call extern object during it?

tjvr commented 7 years ago

I’m afraid I don’t understand what you mean by “chunk by chunk eval”.

Unless there are natural break points in your files, that you can detect before parsing—eg a new line token to separate unrelated parses—I don’t see how you would “split up” parsing.

You almost certainly want to use a tokenizer (moo) for this. You might be better off not using nearley at all, depending on what you’re trying to parse, and how complicated the syntax is.

I’m not sure what you mean about context, either :-) Perhaps you should describe your use case in more detail?

_{Sent with GitHawk}

dashie commented 7 years ago

Imagine a situation like this:

INSERT INTO table (id, name, phone, date) VALUES
  (1, 'Phil', '444444444', NULL),
  (2, 'John', '333333333', NULL),
  (3, 'Anne', '777777777', NULL),
  ...
  10.000 rows
  ...
  (99999, 'Zoe', '111111111', NULL);

I don't want to parse and create a tree of my data, but to create a processor based on a grammar, that fires an event and call a custom callback for example on every VALUE row, with some custom data as context like the info that I parse before in the INSERT INTO row.

But there are many other situation where I need to use a grammar to create a processor that react to a text and does not simply construct a tree.

I did many times processors like that using Javacc. Can I do the same with nearly? Maybe it's my fault that I don't know very well the library.

dashie commented 7 years ago

Another example about the possibility to use a context object in postprocessor. I would like to attach an custom object to the parser, in this way:

const parser = new nearley.Parser(myGrammar);
parser.ctx = myCustomObject;

and to be able to access this object in the post processor

function postprocessor (d,pos,err,ctx) {
    ...
    ctx.doSomething(d);
}

in do something for example I can start to import data in the db, or other stuff.

dashie commented 7 years ago

Here is an update with example. Look at this branch and try

node bin/nearley-test.js -q examples/js/events-stream.js < examples/events-stream.data

you can see that:

I had to fix the nearley-test.js using readline because if a stream read is broke in the middle of a moo token we have an error
I'm not yet able to process all the events even if they are an "iterative" sequence, because - I believe - nearley accomulates stack calls
finally I want to be able to call an external function on every event to process my data (try to look at file events-stream.ne)

tjvr commented 7 years ago

I’m sorry, but I don’t think you can use Nearley to do what you want. Nearley uses the Earley algorithm, which brings many of the benefits listed in the readme; unfortunately for your use case, Earley is a top-down parser, not a bottom-up one.

If you’re interested, you might find hardmath123’s blog post introducing the algorithm interesting: https://hardmath123.github.io/earley.html

_{Sent with GitHawk}

volkanunsal commented 5 years ago

@dashie We had a similar usecase. This patch worked for us. I haven't tested it with your usecase, though. It seems more complicated than mine.

// nearley.js:82
State.prototype.finish = function() {
  if (this.rule.postprocess) {
    // Pass context as the 4th argument
    this.data = this.rule.postprocess(this.data, this.reference, Parser.fail, this.context);
  }
};

// grammar.ne
main -> "foo" {% (d, _1, _2, c) => c.addVertex(d) %}

// nearley.js:313
var next = state.nextState({
  data: value,
  token: token,
  isToken: true,
  reference: n - 1,
  // Add context to the state
  context: this.options.context || {}
});

const g = nearley.Grammar.fromCompiled(grammar)
const context = { addVertex: (a) => { ... } }
// Pass context in the options.
const opts = { context }
var parser = new nearley.Parser(g, opts);
parser.feed("foo");