Rule with terminal's name causes unintuitive behaviour

weketoj commented 11 months ago

Defining a non-terminal with a terminal's name (e.g. (defrule integer () "a")) allows usage of that symbol in the parseq function, with the meaning defined in defrule. But, if that name is then used inside of a non-terminal definition, the original terminal's meaning is used.

Reproducible example below:

(ql:quickload 'parseq)
(use-package 'parseq)

(defrule integer () "a")
(parseq 'integer "a") ; => "a"

(defrule rule-using-integer () integer)
(parseq 'rule-using-integer "a") ; => NIL
(parseq 'rule-using-integer '(1)) ; => 1

I would imagine that we should probably signal an error or warning when trying to define a non-terminal with a terminal's name like this?

mrossini-ethz commented 11 months ago

Hello weketoj

Yes, thank you! I noticed this also just a few days ago. I wonder what would be best to do:

Redefine the (non-)terminal and issue a warning?
Not allowing this and issue an error?
Distinguish somehow between built-in terminals and non-terminals and custom ones?
Use different names for built-in terminals and non-terminals which are less likely to be used?

What is your opinion?

weketoj commented 11 months ago

For my particular use-case, I would have preferred the 1st option of allowing redefinition and issuing a warning. I was using parseq for string parsing, and wanted 'integer to represent (+ digit). Since in my particular case I didn't need parseq for the list-processing aspects of it, I didn't care if the original meaning of 'integer was superseded.

In terms of a way to allow this behaviour, (and I'm just spitballing here, as I may not understand the nuance of your code) we could move the fallback case of the cond that handles rule expansion (line 250 of defrule.lisp) up to the second case, and change the condition from t to a lookup in *rule-table*? That could allow for defined non-terminals to override terminals.

The question of the theoretical user who wants to use a terminal that they redefined could be to reserve a symbol called terminal or builtin-terminal, who's sole purpose would be to skip lookup of a non-terminal and instead go straight to the terminal's behaviour. Example below

(defrule integer () (+ digit)) 
; 'integer now matches strings of 1 or more digit
(defrule rule-using-terminal-integer () (and (terminal integer))) 
; 'rule-using-terminal-integer matches a list of a single integer, even though 'integer has been redefined

The above thought would probably be helped by extracting the terminals' behaviour into their own table (*terminal-table*?), which simplifies the logic of the symbolp branch of expand-atom into something like the following

(cond 
  ;; Is a lambda variable. Since we don't know what the value is at compile time, we have to dispatch at runtime
  ((have rule args) `(try-and-advance (runtime-dispatch ,expr ,rule ,pos) ,pos))
  ;; Is a call to a rule defined in *rule-table*
  ((gethash rule *rule-table*) `(try-and-advance (parseq-internal ',rule ,expr ,pos) ,pos))
  ;; Is a defined terminal in *terminal-table*, which holds lambdas that generate the quoted test-and-advance forms for each terminal case
  ((gethash rule *terminal-table*) ((gethash rule *terminal-table*) rule expr pos)))

A restructuring like the above would also open the door to possible user-defined terminals, if that is useful at all.

Sorry for going a bit out of scope of your original question, how do you feel about those ideas?

mrossini-ethz commented 11 months ago

That sounds intriguing. I am also thinking about maybe using conditions with restarts where the program could choose between redefining and ignoring the new rule. Or one could make a switch that enables/disables the redefining of the predefined terminals. Or to have a function to rename things. But then again, the rule names are irrelevant at the end of the day and one could always use a rule named "myinteger" instead of "integer" for example. However, if by any chance a rule had the name of a built-in terminal, the program should at least issue a warning or so. I will continue thinking about a solution.

mrossini-ethz commented 11 months ago

Hi

I used the opportunity to completely redesign the handling of terminals. They are not hardcoded anymore and the list *terminal-list* is used instead. One can now add more terminals to the list using define-terminal, define-simple-terminal or define-simple-sequence-terminal. You can remove them from the list using undefine-terminal. If you want to undefine the integer nonterminal you will have to call (undefine-terminal 'integer). Note that the new functions are not (yet) exported, so you will have to access them via parseq::undefine-terminal for now.

The changes are in the devel branch. Could you please check whether your code still works? Unit tests are still successful but maybe I am not covering all cases.

There is still no warning if you try to define a rule with the same name of a terminal. At the moment everything is as it was before. I would like to make the code more modular before deciding on a way to handle the problem.

mrossini-ethz / parseq

Rule with terminal's name causes unintuitive behaviour #2