orangeduck / mpc

A Parser Combinator library for C
Other
2.68k stars 294 forks source link

Does the order of definitions matter in languages? #144

Open IngwiePhoenix opened 2 years ago

IngwiePhoenix commented 2 years ago

Hello!

After having tried and waited for several other parsing libraries to provide a HTML and CSS parser, and having gotten nothing out of it other than some knowledge, I have given up and want to write my own parsers using MPC.

Now, I am far from an expert in parsing and really just getting started - and, out of frustration, no less. ;)

So I wanted to start with something very, very simple: In CSS, seeing .foo > .bar is not uncommon, so I wanted to implement a basic version of that in mpc to learn how to start with the very basics:

#include "mpc.h"

int main() {
  char* input = "a>b";
  mpc_parser_t* ident = mpc_new("ident");
  mpc_parser_t* desc = mpc_new("desc");
  mpc_parser_t* sel = mpc_new("sel");
  mpca_lang(MPCA_LANG_DEFAULT,
    "ident = /[0-9a-zA-Z]+/ ;"
    "desc = '>' ;"
    "sel = <ident> | (<ident> <desc> <ident>)+ ;",
    ident, desc, sel, NULL
  );
  mpc_result_t r;
  if (mpc_parse("input", input, sel, &r)) {
    mpc_ast_print(r.output);
    mpc_ast_delete(r.output);
  } else {
    mpc_err_print(r.error);
    mpc_err_delete(r.error);
  }
}

This, however, tells me:

./foo
input: error: Parser Undefined!

My goal was to parse either "a" or "a>b" for now, and later add subsequent further depths with the > operator, like a>b>c>d.... What is the problem here, exactly? Did I order the rules incorrectly or something?

Thanks and kind regards!

(PS. my idea is to parse basic CSS to extract properties and feed it to LVGL widget properties and use HTML as a "commonly known way" to describe a UI. Think Electron, but much, much more minimal...intended for tiny tools at best.)

HalosGhost commented 2 years ago

I'm not sure if this will solve your issue completely, but you appear to be using = in the text of mpca_lang() when you should be using :.

carueda commented 1 year ago

Just exploring MPC (which looks great so far!)

... and let me chime in with the mentioned fix as well as one more (switch order of the or options for sel):

int main() {
    const char* input = "a>b";
    mpc_result_t r;
    mpc_parser_t* ident = mpc_new("ident");
    mpc_parser_t* desc = mpc_new("desc");
    mpc_parser_t* sel = mpc_new("sel");
    mpca_lang(MPCA_LANG_DEFAULT,
              "ident : /[0-9a-zA-Z]+/ ;"
              "desc  : '>' ;"
              "sel   : (<ident> <desc> <ident>)+ | <ident> ;",
              ident, desc, sel, NULL
    );
    if (mpc_parse("input", input, sel, &r)) {
        mpc_ast_print(r.output);
        mpc_ast_delete(r.output);
    } else {
        mpc_err_print(r.error);
        mpc_err_delete(r.error);
    }
}

Now it runs ok:

>
  ident|regex:1:1 'a'
  desc|char:1:2 '>'
  ident|regex:1:3 'b'