loreanvictor / kaashi

A turing complete JSON-like declarative language for data/config description
MIT License
17 stars 1 forks source link

Grammar specification format #5

Open sesajad opened 4 years ago

sesajad commented 4 years ago
  1. Even so-scientific language like Haskell doesn't use a standard grammar specification.
  2. ABNF is not designed to support UTF-8 in the first place, its tools are old and not supporting any other encoding than ASCII.
  3. PEG is more recent and it must have a better toolset.
  4. Or we can just use mathematical notations in the whitepaper and for implementation, we should first decide compiler language then parser tool then format in which we want to enter the grammar.
  5. It's also routine to use a two-level approach (tokenizing / parsing) which works better (e.g. for comments and whitespaces)
loreanvictor commented 4 years ago
  1. Even so-scientific language like Haskell doesn't use a standard grammar specification.

Hmmm ....

  1. ABNF is not designed to support UTF-8 in the first place, its tools are old and not supporting any other encoding than ASCII.

Agreed.

  1. PEG is more recent and it must have a better toolset.

I'm curious. Could you provide a sample of how Kaashi syntax spec would look like in PEG? And some links on available tooling?

Anyways I believe ease of writing implementations is key, so having a usable specification (ideally with tooling for generating parsers in different languages / contexts) would be pretty useful.

However, if this takes too long, we can simply fallback to loose EBNF (or mathematical notation) and move ahead towards creating a reference implementation, and coming back to this task later.

sesajad commented 4 years ago

Hmmm ....

Look, it first defines a notation that is not non-trivial neither one of the standard notations.

I'm curious. Could you provide a sample of how Kaashi syntax spec would look like in PEG? And some links on available tooling?

Yes, I'll

Anyways I believe ease of writing implementations is key, so having a usable specification (ideally with tooling for generating parsers in different languages / contexts) would be pretty useful.

I agree, and for the sake of usability, I think tokenizing with regex then parsing with PEG / [A/E]BNF is the best choice.

loreanvictor commented 4 years ago

Look, it first defines a notation that is not non-trivial neither one of the standard notations.

Yes, nevertheless there is a standard-ish grammar specification.

I think tokenizing with regex then parsing with PEG / [A/E]BNF is the best choice.

How come? Couldn't you do it all with PEG? What are the benefits of such a split?

sesajad commented 4 years ago

It's not easy to implement whitespace and comments in a single-step parsing. especially, PEG and [A/E]BNF are not good for that (although I guess it's not impossible).

A bad idea is to use a simple preprocessor instead of tokenizing and then these PEG / [A/E]BNF. (it means we're computing many things twice).

Current Solution: Using ANTLR, Bison, or something like them.

Alternative: Use a common/minimal of regex for tokenizing, together with a CFG that is well-documented. We can set a standard our parsing algorithm too. And then it's easy to implement in any language with automatic tools.

Why? I haven't found any language yet that uses a single step parser. ignoring JSON and SQLite!

loreanvictor commented 4 years ago

man it is sort of stupid that there is no proper format for syntax specs that works in 2020 and also has proper tooling for parser-generation as well.

anyways if the result of your research is that really there is no such tool, then I guess the proposed alternative is the better choice. even more, actually if the syntax specification is not going to have a functional role in parser / parser-generation, then we can simply use EBNF (as it is pretty easy on the eye) and mention discrepancies in comments (or via custom/undefined tokens).