opennars / OpenNARS-for-Applications

General reasoning component for applications based on NARS theory.
https://cis.temple.edu/~pwang/NARS-Intro.html
MIT License
91 stars 40 forks source link

Suggestion: centralise asserts for parsing errors #217

Closed seamus-brady closed 1 year ago

seamus-brady commented 2 years ago

Just an observation on interacting with the ONA system... all the assert errors call exit(1); This is appropriate for some cases where the system has encountered a null in one of the internal data structures, but should happen on parsing errors? It would be much better behaviour in the repl if the input was parsed and then continued after a parsing error. This may be more user friendly. The calls below are the ones I mean:

assert(len > 1, "Parsing error: Narsese string too short!");

assert(len < NARSESE_LEN_MAX, "Parsing error: Narsese string too long!"); //< because of '0' terminated strings

assert(narseseInplace[openingIdx] == '{' || narseseInplace[openingIdx] == '%', "Parsing error: Truth value opener not found!");

assert(narseseInplace[openingIdx-1] == ' ', "Parsing error: Space before truth value required!");

assert(*punctuation == '!' || *punctuation == '?' || *punctuation == '.', "Parsing error: Punctuation has to be belief . goal ! or question ?");

These should probably use a slightly different assert:

void Globals_assert_no_exit(bool b, char* message) { if(!b) { puts(message); puts("Parse failed."); } }

I did try coming up with a patch for this, but it is quite tricky! It ended up too complicated. So I sympathise, error handling in C is not easy! I offer my suggestion for what it is worth.

seamus-brady commented 2 years ago

Just an extra comment... There is an excellent library for a repl in C that might be useful:

https://github.com/jwerle/repl.h

It is part of the CLibs project:

https://www.clibs.org/packages/

patham9 commented 2 years ago

I agree it can be annoying for people who want to use the system interactively via the shell. I'm hence open to having a PARSER_EXIT_ON_INVALID_INPUT flag in Globals.h which when set to false allows one to let the parser stop the parsing and print an error message without exiting.

However it shouldn't be the default, since ONA is mostly embedded in applications where Narsese is passed automatically to the system's shell from various sensor preprocessing etc. Now, if an application passes a bad Narsese string format it is important to make the programmer immediately aware of it during development, exiting is a quite effective way to do so and can save the programmer a lot of time over ignored/only printed or warned about format issues in potentially long output logs.

We could have a Narsese sanitizer at some point which one can then simply chain to the system via pipe and which will capture both syntactic errors of the basic BNF but also extended format checks according to common errors in Narsese usage. To capture cases such as incorrect usage of variables, e.g. # instead of $: <<#1 --> a> ==> <#1 --> b>>. "Error: same dependent variable on both sides of implication" or compounds encoded in strange ways: <(a & b) ==> (b | c)>. "Error, inheritances or similarities missing" or multiple implications in a single sentence: <<a ==> b> ==> c>>. "Error: more than one implication in the sentence, please use conjunction"

Probably that's far in the future, but the additional flag in Globals.h which will hamper the parser to exit on error should be relatively easy to add. But not sure when I will do so as my focus is still on the reasoner which has a lot of room for improvements while the existing parser, except of the input checking, gets the job done perfectly. I will leave this issue open at least until the flag in Globals.h will be there and properly considered in Narsese.c.

seamus-brady commented 2 years ago

Thanks Patrick, that sounds ideal. I agree a separating out a parser/sanitiser would be great, but all in time :) The PARSER_EXIT_ON_INVALID_INPUT flag would be perfect until then. Thanks!

patham9 commented 1 year ago

After looking at this again, a key issue is still that the parser can only detect very limited types of invalid input at current stage. So even when quite complex and non-pretty error handling will be added for the currently detected cases, it's likely not improving the situation much. Making sure ONA receives proper Narsese in all circumstances will still be mandatory even after this change so I guess it is too much for now. The santiser will be more worthy of time as it can capture all syntatic cases and even the most common semantic ones.

PtrMan commented 1 year ago

I vote for a better parser and a sanitizer additionally to make the narsese IO even more userfriendly. There will always be "narsese-developers".

To me a sanitizer shouldnt be a external shellscript because that introduces a lot of issues (difficult to use pipes when instantiating as a OS process, etc.).