jeffreykegler / Marpa--R2

Parse any language you can describe in BNF
https://jeffreykegler.github.io/Marpa-web-site/
Other
157 stars 23 forks source link

Conditions for a grammar to fail #104

Closed jddurand closed 10 years ago

jddurand commented 10 years ago

Currently a grammar does not fail if some G1 symbols are inaccessible. But fails if at least one lexeme from L0 is not accessible from G1.

Can you make the condition be: fail is there no G1 symbol accessible, regardless of lexemes accessibility ?

Thanks.

ps: copy/paste from http://pastebin.com/RaJzDuXt

compoundStatementReenterScope ::= LCURLY_REENTERSCOPE RCURLY_SCOPE
                                | LCURLY_REENTERSCOPE blockItemList RCURLY_SCOPE

% perl -Ilib bin/c2ast --lazy --loglevel DEBUG --nocpp --start blockItemList /tmp/test.c

Inaccessible symbol: LCURLY_REENTERSCOPE
Inaccessible symbol: compoundStatementReenterScope
Inaccessible symbol: declarationList
Inaccessible symbol: externalDeclaration
Inaccessible symbol: fileScopeDeclarator
Inaccessible symbol: functionDefinition
Inaccessible symbol: functionDefinitionCheck1
Inaccessible symbol: functionDefinitionCheck1declarationList
Inaccessible symbol: functionDefinitionCheck1declarationSpecifiers
Inaccessible symbol: functionDefinitionCheck2
Inaccessible symbol: functionDefinitionCheck2declarationSpecifiers
Inaccessible symbol: translationUnit
Uncaught exception from user code:
        A lexeme in lexer L0 is not accessible from the G1 start symbol: LCURLY_REENTERSCOPE
        Marpa::R2 exception at lib/MarpaX/Languages/C/AST/Impl.pm line 65.
        Marpa::R2::exception('A lexeme in lexer L0 is not accessible from the G1 start symb...') called at /usr/local/lib/perl/5.18.2/Marpa/R2/SLG.pm line 457
        Marpa::R2::Internal::Scanless::G::hash_to_runtime('Marpa::R2::Scanless::G=ARRAY(0xa062b24)', 'Marpa::R2::Internal::MetaAST::Parse=HASH(0xa125878)', 'HASH(0x9420c6c)') called at /usr/local/lib/perl/5.18.2/Marpa/R2/SLG.pm line 66
        Marpa::R2::Scanless::G::new('Marpa::R2::Scanless::G', 'HASH(0xa0624e4)') called at lib/MarpaX/Languages/C/AST/Impl.pm line 65
        MarpaX::Languages::C::AST::Impl::new('MarpaX::Languages::C::AST::Impl', 'HASH(0xa0624e4)', 'HASH(0xa0628a4)') called at lib/MarpaX/Languages/C/AST.pm line 180
        MarpaX::Languages::C::AST::new('MarpaX::Languages::C::AST', 'lexemeCallback', 'ARRAY(0x8ea1e30)', 'logInfo', 'ARRAY(0x9fd90b0)', 'lazy', 1, 'typedef', 'ARRAY(0xa01fff4)', ...) called at bin/c2ast line 147
jeffreykegler commented 10 years ago

I'll need to think about this.

Why is this a show-stopper for cpretty?

jddurand commented 10 years ago

cpretty will play with :start within the same grammar. Any other :start but translationUnit is failing.

jeffreykegler commented 10 years ago

Just so I'm clear. I can add a way to make inaccessible's OK -- no warning and no fatal error. Do you want this for lexemes, non-lexmes or both? (Eventually I should do it for both, but I will target your requirements first.)

jddurand commented 10 years ago

Both, but please note that I am OK with having warnings, the exception on lexemes is the showstopper I am talking about.

jeffreykegler commented 10 years ago

I'll extend the lexeme pseudo-rule with a inaccessible adverb, which will take ok, warn and fatal as values.

@jddurand -- could you prepare two tests cases with an inaccessible L0 lexeme? One should cover the case where the lexeme actually does occur in the input. I'll handle this by automatically marking a lexeme forgiving if it is also inaccessible => ok. (Otherwise, it would cause the parse to fail.)

jddurand commented 10 years ago

ok will do - many thx

jeffreykegler commented 10 years ago

@jdurand -- if it's OK, I'd like to take this one on after Phase 3. The difference should be only a matter of a few days, and it'll give you more time to work up the test cases.

The planned phase 3 valued-added turns out to be connected with this issue. If inaccessible lexemes are allowed, the question comes up, what do we do when we find one in the input? The reasonable thing seems to be that, when we allow an inaccessible lexeme, we also mark it forgving. And the means some possibility of going quadratic O(n**2) unless we have the "cheap forgiveness" which I hope will be result of Phase 3.

jeffreykegler commented 10 years ago

@jdurand -- The current interphase (3/4) will be short, and I'd probably need to reflect on the test cases, so this one will probably not make it in.

jdurand commented 10 years ago

you're missing a 'd' it's @jddurand

jeffreykegler commented 10 years ago

@jdurand -- my apologies!

jeffreykegler commented 10 years ago

@jddurand -- The current interphase (3/4) will be short, and I'd probably need to reflect on the test cases, so this one will probably not make it in.

jddurand commented 10 years ago

Here is a use case of what I am thinking:

use strict;
use diagnostics;
use Marpa::R2;

my $grammar_source = do {local $/; <DATA>};

foreach (qw/realstart start1 start2/) {
    my $real_source = $grammar_source;
    $real_source =~ s/\$START/$_/;
    print STDERR "With :start ::= $_\n";
    eval {Marpa::R2::Scanless::G->new({ source => \$real_source})};
    print STDERR '... status is ' . ($@ ? 'rejected' : 'accepted') . "\n";
}

__DATA__
:start ::= $START

realstart ::= start1 | start2

start1 ::= X
start2 ::= Y

X ~ 'X'
Y ~ 'Y'

what about accepting the sub grammars if X and Y lexemes have inaccessible_ok => 1 ? Same logic could apply to G1.

rns commented 10 years ago

"...parse a subset of a larger grammar, and it's convenient to just redefine the start and tell Marpa to just ignore the rest of it." — I have a use case for that too.

I'm currently writing a generator of tree transducer from 2 grammars and inputs for their rules — trying to build SLIF/RE translator the harder way. :)

In the case of lexemes, their AST's, however trivial, have to be added manually, but with custom :start symbols that allow parsing with only a subset of the grammar, the AST's of single-lexeme inputs (and, thus, the entire tree transducers) can be generated automatically that would be a very good thing.

So, if I can help with something, anything, I'd gladly will.

jeffreykegler commented 10 years ago

@rns: On this, I decided to wait to see what you did with issue #109, since I thought merge conflicts would be likely otherwise. (I need to redo the theory article, so I'm doing that in the meantime.) I was then going to proceed with this one.

It is probably best I do this one -- only a few lines of code need to be changed, but there are conceptual issues involved, about which I am fussy.

My "design" so far -- add an "inaccessible" adverb, whose values can be "fatal" (the default), "ok" and "discard". "inaccessible" can be specified in the pseudo-rules. "fatal" means the current behavior -- a fatal error. "ok" means siliently ignore inaccessibles in the grammar, but actually encountering one in the input will still be a fatal error. "discard" means siliently ignore inaccessibles when found in the grammar, and discard them silently when found in the input.

Note that, if a lexeme is "forgiving" and "inaccessible", it will never be found in the input. That's because "forgiving" only looks for acceptable lexemes, and inaccessible lexemes are never acceptable.

rns commented 10 years ago

Ok, I'll focus on delivering #109.

Re the design: looks good, my only suggestion would be to allow changing :start symbol after the grammar is created, e.g. as an argument to set() grammar mutator: $slg->set( { :start => 'start1' } ). Is that ok? Doing text substitution on the grammar source is fine, but feels quick-hackish a bit.

With this in mind, I'd rewrite @jddurand's use case above as shown in this gist. Hope this fits @jddurand's initial suggestion.

An L0 rule can't be set as a start symbol, so my own use case would still require creating a special case grammar. But that can well be a moving target, I'll know better as I move further with tree transducer code, so that's fine.

Overall, I'm missing NAIF's capability of fully programmatic grammar construction, so perhaps put smth. like MarpaX::Languages::SLIF::AST on my already overpopulated queue.

jeffreykegler commented 10 years ago

@rns -- re the hackish feel of text substitution -- I had that same feeling. But I'd suggest getting over it. Text substitution in terms of efficiency, is as good (or not much worse) than manipulating hash/array data structures and named arguments. And Marpa is about teaching the programming world to think in terms of DSL's and language-driven programming. So it's a style of programming I think we want to promote.

The "hack-ish" feel of text mangling comes from long experience in a world where the next step with text meant parsing, which meant big trouble, while the next step with other data structures meant well-understood kinds of data manipulation. We are changing that world.

rns commented 10 years ago

Now, SLIF grammar construction/manipulation via, e.g., templating now looks more ok and simpler too. Smth. to think about, thanks.

jeffreykegler commented 10 years ago

I hope I have what is wanted in the commit just referenced. If you add

 inaccessible is ok by default

to your SLIF, inaccessible symbols will be ignored. There are now several examples in the test suite, and t/sl_gia.t includes one similar to the use case -- it changes between start symbols. (This is done using a regex to change text in a template SLIF DSL.)

There's documentation is the Scanless::DSL pod. Let me know if this answers the need.

jddurand commented 10 years ago

Jeffrey,

Tested "inaccessible is ok by default" on:

int func1(int x1, double x2, float ( f1)(int x11, double x12));

and setting the :start to declarationSpecifiers, This mean that the grammar should not emit any warning and should be able to parse "int" only. This is exactly what has happened.

So this mean it is a priori working fine.

Two questions:

In the future, this mean that I can build on-the-fly a fake newStart containing the LHS I am interested it, and set :start to this fake newStart. cpretty revival -;

Thanks, JD.

Le samedi 22 février 2014, 20:39:15 Jeffrey Kegler a écrit :

I hope I have what is wanted in the commit just referenced. If you add

 inaccessible is ok by default

to your SLIF, inaccessible symbols will be ignored. There are now several examples in the test suite, and t/sl_gia.t includes one similar to the use case -- it changes between start symbols. (This is done using a regex to change text in a template SLIF DSL.)

There's documentation is the Scanless::DSL pod. Let me know if this answers the need.


Reply to this email directly or view it on GitHub: https://github.com/jeffreykegler/Marpa--R2/issues/104#issuecomment-35823819

jeffreykegler commented 10 years ago

The

inaccessible is ok by default

statement can go anywhere in the grammar. If it creates an ambiguity, there are several ways to fix it. One is by adding a ';'. A semicolon can follow any statement, and this can be used to eliminate ambiguities.

warn is the default because of backward compatibility. This is a first example of allowing and using ambiguity in a language, something I think is an important technique.