opencog / asmoses

MOSES Machine Learning: Meta-Optimizing Semantic Evolutionary Search for the AtomSpace (https://github.com/opencog/atomspace)
https://wiki.opencog.org/w/Meta-Optimizing_Semantic_Evolutionary_Search
Other
38 stars 29 forks source link

File tokenizer not respecting commas in CSV files #108

Closed gl-yziquel closed 9 months ago

gl-yziquel commented 10 months ago

I have been following the only "tutorial" I know of concerning (as)moses:

https://www.youtube.com/watch?v=LAIogkvxyMA

The above video is by Nil Geisweiller, and the asmoses invocation below is extracted verbatim from the above video, almost near the beginning.

I get the following failure with the unhelpful exception message.

mini-me@virtucon ~/h/s/mud-asmoses (master)> just
asmoses -H pre -q 0.1 -W 1 -i data.csv -j4 --output-format scheme -c 100
terminate called after throwing an instance of 'opencog::AssertionException'
  what():  Parsing error occurred on line 1 of input file
Exception: Expecting boolean value, got  (/home/mini-me/home/cellar/asmoses/opencog/asmoses/data/table/table_io.cc:269) (/home/mini-me/home/cellar/asmoses/opencog/asmoses/data/table/table_io.cc:1148)
Aborted (core dumped)
error: Recipe `run` failed on line 2 with exit code 134

Any input as to best practices to debug code such as asmoses would be welcome.

gl-yziquel commented 10 months ago

The asmoses.log file yields:

[2023-09-07 13:26:50:933] [INFO] moses version "3"."7"."0"
[2023-09-07 13:26:50:933] [INFO] hostname: virtucon
[2023-09-07 13:26:50:933] [WARN] WARNING: This version of MOSES does NOT have MPI support!

[2023-09-07 13:26:50:934] [INFO] Command line: asmoses -H pre -q 0.1 -W 1 -i data.csv -j4 --output-format scheme -c 100
[2023-09-07 13:26:50:934] [INFO] Read data file data.csv
[2023-09-07 13:26:50:934] [ERROR] Expecting boolean value, got  (/home/yziquel/home/cellar/asmoses/opencog/asmoses/data/table/table_io.cc:269)
        Stack Trace:
        2: basic_string.h:195     std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_data() const
        3: Logger.cc:596          opencog::Logger::Error::operator()(char const*, ...)
        4: exceptions.cc:54       opencog::StandardException::parse_error_message(char const*, __va_list_tag*, bool)
        5: exceptions.cc:82       opencog::StandardException::parse_error_message(char const*, char const*, __va_list_tag*, bool)
        6: exceptions.cc:331      opencog::AssertionException::AssertionException(char const*, __va_list_tag*)
        7: oc_assert.cc:45        opencog::cassert(char const*, bool, char const*, ...)
        8: table_io.cc:270        opencog::combo::token_to_boolean(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
        9: stl_algo.h:4296      _ZSt9transformIN9__gnu_cxx17__normal_iteratorIPKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt6vectorIS7_SaIS7_EEEESt20back_insert_iteratorISA_IN7opencog5combo2id7builtinESaISI_EEEPFSI_RS8_EET0_T_SQ_SP_T1_()
        10: stl_vector.h:1046     std::vector<opencog::combo::multi_type_seq, std::allocator<opencog::combo::multi_type_seq> >::operator[](unsigned long)
        11: table_io.cc:1057      opencog::combo::istreamTable(std::istream&, opencog::combo::Table&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)
        12: fstream:605   std::basic_ifstream<char, std::char_traits<char> >::~basic_ifstream()
        13: table-problems.cc:122         opencog::moses::table_problem_base::common_setup(opencog::moses::problem_params&)
        14: table-problems.cc:381         opencog::moses::pre_table_problem::run(opencog::moses::option_base*)
        15: moses_exec.cc:57      opencog::moses::moses_exec(int, char**)
        16: libc_start_call_main.h:58   __libc_start_call_main()
        17: libc-start.c:128    call_init()
        18: [0x1095] ??() ??:0

[2023-09-07 13:27:16:447] [ERROR] Parsing error occurred on line 1 of input file
Exception: Expecting boolean value, got  (/home/mini-me/home/cellar/asmoses/opencog/asmoses/data/table/table_io.cc:269) (/home/mini-me/home/cellar/asmoses/opencog/asmoses/data/table/table_io.cc:1148)
        Stack Trace:
        2: basic_string.h:195     std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_data() const
        3: Logger.cc:596          opencog::Logger::Error::operator()(char const*, ...)
        4: exceptions.cc:54       opencog::StandardException::parse_error_message(char const*, __va_list_tag*, bool)
        5: exceptions.cc:82       opencog::StandardException::parse_error_message(char const*, char const*, __va_list_tag*, bool)
        6: exceptions.cc:331      opencog::AssertionException::AssertionException(char const*, __va_list_tag*)
        7: oc_assert.cc:45        opencog::cassert(char const*, bool, char const*, ...)
        8: table_io.cc:1150     oper
        9: table_io.cc:1057       opencog::combo::istreamTable(std::istream&, opencog::combo::Table&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)
        10: fstream:605   std::basic_ifstream<char, std::char_traits<char> >::~basic_ifstream()
        11: table-problems.cc:122         opencog::moses::table_problem_base::common_setup(opencog::moses::problem_params&)
        12: table-problems.cc:381         opencog::moses::pre_table_problem::run(opencog::moses::option_base*)
        13: moses_exec.cc:57      opencog::moses::moses_exec(int, char**)
        14: libc_start_call_main.h:58   __libc_start_call_main()
        15: libc-start.c:128    call_init()
        16: [0x1095] ??() ??:0

So this seems to be a parsing error, pertaining to the data.csv file. I reproduced it from the "tutorial" video. Here it is:

mini-me@virtucon ~/h/s/mud-asmoses (master)> cat data.csv 
o, f1, f2, f3
1,  1,  1,  1
0,  0,  0,  0
1,  1,  0,  1
0,  0,  1,  0

So, basically, the "tutorial" example seems to not be adapted to the evolution over time of moses / asmoses. There seemingly is a confusion between integers and booleans.

gl-yziquel commented 10 months ago

Launched a gdb debugging session. The problem can be identified in table_io.cc:tokenizeRowIOT:786. On line 792, there is a call to get_row_tokenizer. The toker variable thus obtained, containing the tokenisation of line "1, 1, 1, 1"of the above data.csvfile, splits it up like ["1", ",", " ", "1", ",", " ", "1", ",", " ", "1"]. The first item is identified as the output. Remains 9 items that gets converted to "", "", "1", "", "", "1", "", "", "1" instead of "1", "1", "1". The latter could be converted to booleans, but the former can't. And we get the former. And the code hence crashes.

gl-yziquel commented 10 months ago

To avoid the problem highlighted on this issue, one may use tabs instead of ", ". For instance:

o[tab]f1[tab]f2[tab]f3
1[tab]1[tab]1[tab]1
0[tab]0[tab]0[tab]0
1[tab]1[tab]0[tab]1
0[tab]0[tab]1[tab]0

This satisfies the boost::tokenizer. And allows to then follow the tutorial mentioned in the opening comment.

However, the tokenizer seems broken as it should accept ", " as a separator and it does not do so. So CSV support seems to me to be broken. Which is why I am not closing this issue.

linas commented 9 months ago

This sounds like a correct diagnosis of the problem. I don't know why the tokenizer is not respecting commas. @ngeiswei can you look at this?

linas commented 9 months ago

Removing the spaces after the commas also fixes the problem.

linas commented 9 months ago

merged #112 -- everything should work now.

gl-yziquel commented 9 months ago

This indeed fixes the issue.