opennars / OpenNARS-for-Applications

General reasoning component for applications based on NARS theory.
https://cis.temple.edu/~pwang/NARS-Intro.html
MIT License
91 stars 39 forks source link

The high performance in the `bandrobot` test may be accidental #288

Open ARCJ137442 opened 2 hours ago

ARCJ137442 commented 2 hours ago

Background

The bandrobot test, which is one of a demo in ONA, is aiming to test the multistep event inferencing/subgoaling of ONA reasoner (by NAL-7 & NAL-8 temporal/procedural inferencing)

The scene generated by ASCII art is like:

+++++++++++++++++++++|
---------------------|
            A        |
 o                   |
'''U'''''''''''''''''|

This is a singleplayer game and the main goal is to controll the robot A, pick the ball o and drop it into the bucket U. In this game, ONA is expected to learn the procedual knowledge from comparing frequency of beliefs (corresponding to the relative position between the robot and the ball/bucket), which is logical represented by inference rule { <{S1} |-> [P]>, <{S2} |-> [P]> } |- <({S1} * {S2}) --> (+ P)>(t_frequency_greater) and { <{S1} |-> [P]>, <{S2} |-> [P]> } |- <({S1} * {S2}) --> (= P)>(t_frequency_equal). Using the representation of relative position, ONA is able to learn "pick when the position of the robot is equal to the ball, drop when the position of the robot is equal to the bucket, move left/right to make the position between them equal", thereby provide a proof of that ONA has a efficient procedual learning machanism (sensorimotor intelligence).

Problem

As the title says, although ONA can have high performance on this game by self-learning currently.

However, if we change the random seed of the whole reasoner, it might be seen that the high performance of ONA in this game is accidential:

Pictures

The successful case on mysrand(666)

Screenshot_2024-10-10-16-24-08-371

Failing cases on mysrand(667) and mysrand(668)

Screenshot_2024-10-10-15-22-35-102_com termux-edit Screenshot_2024-10-10-14-36-40-144

patham9 commented 2 hours ago

I agree, robust learning is not achieved for this particular example. I also have a test script which runs it with different seeds to evaluate it, I can commit it soon.

Part of the problem is that by design of this experiment, reward can only obtained in the very rare case that the object at the right position is picked up and then dropped at the target location, which is a rare occasion with motor babbling and when it happens there are tons of other hypotheses to weed out.

The solution will be to take what we learned from NACE and add the corresponding curiosity model to ONA: https://github.com/patham9/NACE

Another immanent change: the numeric representation is the initial incomplete one that has been experimentally added. In the meanwhile there is a solid implementation of numeric spaces which allows the system to both condition on concrete values and to perform comparisons between numeric measurements. With this new numeric value handling learning also seems way more robust: http://91.203.212.130/AniNAL/demo_complex_continuous_verbal.html

ARCJ137442 commented 2 hours ago

@patham9 Okay, I'll study these references later.