Open Eman22S opened 4 years ago
Thanks @Eman22S, that's very useful info.
Have you tried to enable fine
log level and compare with and without port?
If you use diff, you can remove timestamps prior to that, with
https://github.com/singnet/cogutil/blob/master/scripts/util/rm-timestamps.sh
@ngeiswei, that sounds plausible. How do I use that when running asmoses
?
For log level just use asmoses -l fine ...
, for rm-timestamps.sh
you need to call it on the log file afterwards, I think calling rm-timestamps
without arguments provide some help.
@Eman22S, @ngeiswei I think the problems we are facing right now with as-moses rise from same root. The data population is not working properly, specifically the population from input table.
The first problem is the Not a link!
in Interpreter.cc line:66 and the only way that could happen is if a node[Predicate or Schema] is given to the interpreter but it's value is not set, this problem rises only when working on table-based problems and it has to do with populating the dataset. It also explains the above unexpected programs.
I recommend two things
one adding at-least an OC_ASSERT(program->is_node(), "...")
just before line::65 in the Interpreter.
two make sure the input tables are populated properly to the atomspace in table-problems.cc line:167
and populate_atomspace.cc
, and make sure the candidate programs in instance_scorer.cc, composite_score atomese_based_scorer::operator()(const instance &inst)
are in synch with the atomspace instance. You can do that simply by checking if the arguments[Predicates and schemas] are populated after running Handle prog = _as.add_atom(prog);
.
Overview
This is to address the issue raised in the report Optimizing as-moses: Reports #109. To summarize,
asmoses
is currently not working as required comparing its prediction of programs tomoses
's. When the problem is passed as a dataset of contin only or boolean only values, however small it might be, it fails to come up with the correct program prediction. The slow down discussed in the report is also very likely to be related to this, as running demo problems with and withoutatomspace-port=1
do not create a significant slow down as much as when the dataset is passed.Observation
My observation on the behavior of
asmoses
is as follows:Running a demo problem using
atomspace port =1
produces the correct output and no discrepancy created betweenmoses
andasmoses
Running the same problem which is disjunction using a csv dataset however creates incorrect results
dataset-test2.csv
contains :The following is the result of solving a conjunction problem using a csv dataset:
dataset-test1.csv
contains:Suggestions
Looking at these outputs we might conclude that somewhere in the workflow of
asmoses -i dataset.csv
, a logic error is created in writing these codes. The same argument cannot be made for demo problems,however, as they are working as expected.I have tried to look into some of these codes and got few speculations. Ctable population is one of which I have doubt on:
So the dataset is compressed and each of its features is populated into its atomspace in a condensed non-redundant structure. For instance, if we have a two column with 9 rows it condenses to say 2 or 3 rows and gets populated into the atomspace. The target column however is not populated in a way where the compression structure is kept for
atomese
to understand. what I am saying is, we don't have a compressed table representation foratomses
and that might be what's creating all that Compressed Table Representation #19. Henceasmoses
is not quite understanding thectable
is actable
when populating it but turning it into a dataset it likes.Conclusion
I would like to hear your thoughts @ngeiswei @kasimebrahim on the above and what testing strategy I should follow. My initial testing approach is black box testing of each module in the workflow of
asmoses
. But I am quite sure, given the above problem description, there is a systematic way of coming across it.