morazanm / fsm

A DSL for the Automata Theory Classroom
15 stars 6 forks source link

[BUG] Only a subset of the non-terminals are supported in aggregate symbols #17

Closed jasonhemann closed 2 years ago

jasonhemann commented 2 years ago

Documentation describes a cfrule as follows:

A context-free grammar rule is a list of the form (A ARROW J), where A is a nonterminal symbol and J is either EMP > or a an aggregate symbol of terminals and nonterminals.

(incidentally, there's a small typo "or a an" in this and the definition of csrule.

The definition of nonterminal says it supports either upper-case letters or upper-case letters followed by a dash, followed by a number.

(incidentally, since we can write A-1 and A-01 probably better to say "a sequence of digits")

We can construct aggregate symbols for the RHSs of cfrules using single, upper-case letter nonterminals. However, this fails for the longer form versions.

> (make-cfg (list 'A 'A-1) (list 'a 'b 'c) (list (list 'A-1 ARROW 'aAc) (list 'A ARROW 'bA-1b) (list 'A ARROW 'c)) 'A)

 THE RHS OF THE FOLLOWING RULES ARE NOT VALID: 
     (A -> bA-1b)

I see too that valid grammars store the aggregate symbols as lists of individual symbols, but I cannot see a way to directly enter such lists. I believe this currently limits fsm cfgs to 26 variables.

jasonhemann commented 2 years ago

This means, for instance, that using grammar-rename-nts can construct a grammar that we can read out, but cannot read back in.

 >(let ([G (grammar-rename-nts
            '(A B C)
            (make-cfg
             '(A B C)
             '(p e)
             (list
              (list 'A ARROW 'BC)
              (list 'C ARROW 'e)
              (list 'B ARROW 'p))
             'A))])
    (make-cfg
     (grammar-getnts G)
     (grammar-getalphabet G)
     (grammar-getrules G)
     (grammar-getstart G)))

 THE RHS OF THE FOLLOWING RULES ARE NOT VALID: 
     (A-2024330 -> B-2024331C-2024332)
; Check above message for error
; Context (errortrace):
...
jasonhemann commented 2 years ago

``

I believe this currently limits fsm cfgs to 26 variables.

I see that I was incorrect here. For instance we are not limited to the latin alphabet.

> (make-cfg '(A B C Δ) '(c f g γ) '() 'Δ)
(cfg '(A B C Δ) '(c f g γ) '() 'Δ)

So while we are limited to some fixed number of non-terminals, and terminals for that matter, it's maybe not as dire as it might seem at first if the user is willing to use unicode.

morazanm commented 2 years ago

Documentation updated to reflect that FSM programmers are limited to nonterminals of length 1 despite that the FSM internal representation may generate nonterminals of the form -+. Updates made in: b115bd483458df6009acdd18147120cb91604d60