Correcting AS-MOSES outputs:Approach Suggestion

Eman22S commented 4 years ago

Overview

This is to address the issue raised in the report Optimizing as-moses: Reports #109. To summarize, asmoses is currently not working as required comparing its prediction of programs to moses's. When the problem is passed as a dataset of contin only or boolean only values, however small it might be, it fails to come up with the correct program prediction. The slow down discussed in the report is also very likely to be related to this, as running demo problems with and without atomspace-port=1 do not create a significant slow down as much as when the dataset is passed.

Observation

My observation on the behavior of asmoses is as follows:

Running a demo problem using atomspace port =1 produces the correct output and no discrepancy created between moses and asmoses no-discrepency

Running the same problem which is disjunction using a csv dataset however creates incorrect results dataset-or-weired

dataset-test2.csv contains :

datset

The following is the result of solving a conjunction problem using a csv dataset: dataset-and-weired result

dataset-test1.csv contains: datset-and

Suggestions

Looking at these outputs we might conclude that somewhere in the workflow of asmoses -i dataset.csv , a logic error is created in writing these codes. The same argument cannot be made for demo problems,however, as they are working as expected.

I have tried to look into some of these codes and got few speculations. Ctable population is one of which I have doubt on:

So the dataset is compressed and each of its features is populated into its atomspace in a condensed non-redundant structure. For instance, if we have a two column with 9 rows it condenses to say 2 or 3 rows and gets populated into the atomspace. The target column however is not populated in a way where the compression structure is kept for atomese to understand. what I am saying is, we don't have a compressed table representation for atomses and that might be what's creating all that Compressed Table Representation #19. Hence asmoses is not quite understanding the ctable is a ctable when populating it but turning it into a dataset it likes.

Conclusion

I would like to hear your thoughts @ngeiswei @kasimebrahim on the above and what testing strategy I should follow. My initial testing approach is black box testing of each module in the workflow of asmoses. But I am quite sure, given the above problem description, there is a systematic way of coming across it.

ngeiswei commented 4 years ago

Thanks @Eman22S, that's very useful info.

Have you tried to enable fine log level and compare with and without port?

If you use diff, you can remove timestamps prior to that, with

https://github.com/singnet/cogutil/blob/master/scripts/util/rm-timestamps.sh

Eman22S commented 4 years ago

@ngeiswei, that sounds plausible. How do I use that when running asmoses?

ngeiswei commented 4 years ago

For log level just use asmoses -l fine ..., for rm-timestamps.sh you need to call it on the log file afterwards, I think calling rm-timestamps without arguments provide some help.

kasimebrahim commented 4 years ago

@Eman22S, @ngeiswei I think the problems we are facing right now with as-moses rise from same root. The data population is not working properly, specifically the population from input table.

The first problem is the Not a link! in Interpreter.cc line:66 and the only way that could happen is if a node[Predicate or Schema] is given to the interpreter but it's value is not set, this problem rises only when working on table-based problems and it has to do with populating the dataset. It also explains the above unexpected programs.

I recommend two things one adding at-least an OC_ASSERT(program->is_node(), "...") just before line::65 in the Interpreter. two make sure the input tables are populated properly to the atomspace in table-problems.cc line:167 and populate_atomspace.cc, and make sure the candidate programs in instance_scorer.cc, composite_score atomese_based_scorer::operator()(const instance &inst) are in synch with the atomspace instance. You can do that simply by checking if the arguments[Predicates and schemas] are populated after running Handle prog = _as.add_atom(prog);.

singnet / asmoses