Open Eman22S opened 4 years ago
Thanks @Eman22S that's a very useful report.
Could explain how you obtained the Cowles.data
and Melanoma.csv
data sets? And how to access to them, if possible.
I believe --store-atomspace=1
by default, which is consistent with the fact that there's no substantial difference in your benchmark. You should replace it by --store-atomspace=0
. Oh, actually it's not even enabled in the C++ code! See
I have forgotten why it was disabled, I believe it was failing on circleci for unknown reason or such. Anyway, at this stage it should be re-enabled.
@ngeiswei Cowles and Melanoma are datasets found in R datasets https://github.com/vincentarelbundock/Rdatasets/blob/master/csv/MASS/Melanoma.csv . https://github.com/vincentarelbundock/Rdatasets/blob/master/csv/carData/Cowles.csv. Now ofcourse we did not use the original forms of the files because Moses simply can't run them so we binarized them. Here you can find the binarized datasets https://github.com/Eman22S/asmoses/blob/population_branch/scripts/benchmark/datasets/
As for the commented codes in the instance_scorer.cc, like you said it fails on several computers and works on others. (I believe it worked on your computer). We were just discussing with bitseat on reproducing your environment so we understand better what's going on.
Moses port running on Fi_Miller_et_al14_upd
dataset
https://data.giss.nasa.gov/modelforce/Miller_et_2014/Fi_Miller_et_al14_upd.txt
Please Note that the dataset is the binarized form Fi_Miller_et_al14_upd
renamed as fimiller.csv
Thanks @Eman22S. It would be good to understand why there is such slow down, especially on such a small dataset. I suppose it would be good to profile these two runs, maybe with valgrind if it doesn't blow up the RAM.
Also, I would suggest that you create a commit for each experiment, containing the command line and the dataset used, and push these commits to a feature branch on singnet/asmoses
, called something like atomspace-port-experiments
(I think you should have the rights, let me know otherwise). Then here, on the github issue, alongside the results, you include the commit hash of the experiment. This allows to reproduce the experiments and faithfully compare them if needed.
Thanks for you commet @ngeiswei . Beside the obvious poor performance on these datasets, as I have pointed out earlier, the candidate programs generated betweenmoses
and asmoses
are incomparable. asmoses
seems to produce flat out float numbers as opposed to programs with operators and nested operators like the ones produced by moses
for a given dataset. That is undoubtedly a non trivial problem that should be looked into. My suggestion would be to look into these codes that are producing this erroneous outputs as they might be likely the ones contributing to the slow down as well.
I think we can do that while also using valgrind to examine what codes uses the most of the resources. We can rewrite these codes if necessary while simultaneously optimizing it using valgrind.
Oh, indeed, at this point of the port the behaviors should be identical, so yes, it would be good to understand why the candidates are different first.
Overview
This is to report on the comparisons that were made between moses and asmoses and the results that were found based on issue #69.
Benchmark
For comparison purposes we used demo-problems, dataset found in the unit tests as well as 2 external dataset.
Results on comparison made using demo problems-multiplexer problem:
Results on comparison made using dataset(iris.data and IrisSetosa) in unit tests:
In running these datasets I have noticed two issues: One, Moses can't run
asmoses -i datasets/IrisSetosa.data -m10000 -uCLASS
if the columns in IrisSetosa.data are rearranged. But if the columns in iris.data dataset are rearranged, runningasmoses -i datasets/iris.data -m10000 -uclass
runs with no error. Please note that the target class in IrisSetosa is boolean, where as in iris.data its Enum. Even though, our task is to simply compare the performance of moses with and without the--atomspace-port=1
tag whenever the command works, it might be a good thing to report some of the inconvenient ways moses fails to run like the above scenario. Second, Both datasets did not run when the flagatomspace-port=1
was added, the error is returned by the combo_atomese converter that do not support the greater than zero operator yet. For the time being we are working on binarization of all columns in the datasets to boolean to trick moses into not generating the operator.Results on comparison made using external dataset(Cowles.data and Melanoma.csv) :
Again we binarized these two datasets into all boolean columns since moses with and without
atomspace-port=1
can't interpret the original form of the datasets. The command line runs with no error but the programs they generate are entirely different when comparing moses with and withoutatomspace-port
. We assumed there must have been a logical error/misunderstanding in the codes when porting Moses to asmoses. We are looking into it...The entire log file can be found here https://github.com/Eman22S/asmoses/blob/population_branch/scripts/benchmark/asmoses-bench.log The log file was generated by https://github.com/Eman22S/asmoses/blob/population_branch/scripts/benchmark/mb-example.sh and https://github.com/Eman22S/asmoses/blob/population_branch/scripts/benchmark/asmoses-bm.sh