mlpack / benchmarks

Machine Learning Benchmark Scripts
101 stars 49 forks source link

can't run benchmarks #29

Open sam0410 opened 7 years ago

sam0410 commented 7 years ago

Hi, whenever I try to run make run BLOCK=mlpack METHODBLOCK=KMEANS or anyother benchmark, I get the following types of errors

/usr/bin/python3 benchmark/run_benchmark.py -c config.yaml -b mlpack -l False -u False -m KMEANS --f "" --n False -r "" -p "" [WARN ] No module named simplejson [INFO ] CPU Model: Intel(R) Core(TM) i5-5250U CPU @ 1.60GHz [INFO ] Distribution: Ubuntu 16.04 [INFO ] Platform: x86_64 [INFO ] Memory: 7.7080078125 GB [INFO ] CPU Cores: 4 [INFO ] Method: KMEANS [INFO ] Options: -c 5 [INFO ] Library: mlpack [INFO ] Dataset: cloud [FATAL] No conversion possible. [FATAL] Could not execute command: ['mlpack_kmeans', '-h'] [FATAL] Could not execute command: ['mlpack_kmeans', '-i', 'datasets/cloud.csv', '-I', '-o', 'output.csv', '-v', '-c', '5']

    mlpack  matlab  scikit  mlpy  shogun  weka 

cloud -2 - - - - -

can anyone please suggest me how to solve this?

zoq commented 7 years ago

My first guess is that the command (export MLPACK_BIN=$(shell dirname $(firstword $(shell which mlpack_knn)))/) used to figure out the mlpack binary path failed, so an easy solution would be to manually specify the path e.g.

make run BLOCK=mlpack METHODBLOCK=KMEANS MLPACK_BIN=/path/to/mlpack/build/bin/ MLPACK_BIN_DEBUG=/path/to/mlpack/build/bin/

The 'No conversion possible.' message is strange can you post the cloud dataset somewhere?

sam0410 commented 7 years ago

Hi @zoq ! Regarding the dataset, it wasn't downloaded properly. when I loaded the file correctly, the 'No conversion possible.' message got removed. But still the errors like [FATAL] Could not execute command: ['mlpack_kmeans', '-h'] [FATAL] Could not execute command: ['mlpack_kmeans', '-i', 'datasets/cloud.csv', '-I', '-o', 'output.csv', '-v', '-c', '5'] persisted. So, I tried doing what you had suggested now. I am working on Ubuntu and I installed mlpack package using the command sudo apt-get install libmlpack-dev and the sample programs ran fine. But in this, there is no build folder of mlpack created. I tried doing dpkg -L libmlpack-dev in the cmd to find where it was installed. It pointed me to /usr/include/ of my PC. There was a mlpack folder in this but the executables (that is, build/ folder didn't exist) weren't there in this too. So, I uninstalled this mlpack package and built it from source and provided the bin directory just as you suggested- which is _make run BLOCK=mlpack METHODBLOCK=KMEANS MLPACK_BIN=/home/samikshya/Desktop/mlpack-2.1.1/build/bin/ MLPACK_BINDEBUG=/home/samikshya/Desktop/mlpack-2.1.1/build/bin/ in my case. and it worked! Thanks a lot ! I think we can run the benchmarking code only if we build mlpack from source and not if we install the package. Please correct me if wrong. Thanks.

zoq commented 7 years ago

I think the problem is that you installed some "old" version of mlpack via sudo apt-get install libmlpack-dev that uses alknn instead of mlpack_allknn as executables name. But the script uses the new naming scheme to find the path.

pvskand commented 7 years ago

Hey @zoq! I tried to run the benchmarks as suggested by you i.e I ran the following command make run BLOCK=mlpack METHODBLOCK=KMEANS MLPACK_BIN=/home/skand/Documents/mlpack/build/bin/ MLPACK_BIN_DEBUG=/home/skand/Documents/mlpack/build/bin/. But I am still getting some error that says: **[FATAL] No conversion possible. [FATAL] No conversion possible. [FATAL] Could not execute command: ['/home/skand/Documents/mlpack/build/bin/mlpack_kmeans', '-i', '-I', '-o', 'output.csv', '-v', '-c', '6']**

sam0410 commented 7 years ago

Hi @zoq !

I agree with @pvskand , Iam getting the same problem- even though /path/to/mlpack/build/bin/ consists a mlpack_kmeans method.

But if I try specifying the /path/to/mlpack/build/bin/ as the path to bin/build of mlpack-2.2.0 , it is working fine

zoq commented 7 years ago

@pvskand I guess I know what happens here, the parameters of the mlpack_kmeans executables changed from your version to the latest version. Also I think we could improve the error message here, by checking if there is an executable first and afterwards check if the parameters match with the current version.

Also, can you make sure that you downloaded the necessary datasets? Take a look at : https://github.com/zoq/benchmarks#getting-the-datasets

sam0410 commented 7 years ago

Hi @zoq ! Can you please explain executables changed from your version to the latest version and how do I run the benchmarks successfully now? Thanks for your patience.

zoq commented 7 years ago

It's possible that the parameters of one executable changed from one version to another, e.g.

mlpack_kmeans -i /path/to/datasets/cloud.csv -I -o output.csv -v -c 5

there is no option -I anymore so, the executable would fail if you pass -I.

So if you see the following error message:

[FATAL] Could not execute command: ['mlpack_kmeans', '-i', 'datasets/cloud.csv', '-I',
'-o', 'output.csv', '-v', '-c', '5']

And easy test would be to manuelly run the executable and to look at the error message:

mlpack_kmeans -i datasets/cloud.csv -I -o output.csv -v -c 5

maybe you can post the error here?

sam0410 commented 7 years ago

Hi @zoq ! This was the error that was produced. [FATAL] Caught exception from parsing command line: the required argument for option '--initial_centroids' is missing terminate called after throwing an instance of 'std::runtime_error' what(): fatal error; see Log::Fatal output Aborted (core dumped)

zoq commented 7 years ago

@sam0410 okay, so the initial centroid file (-I option) should only be added if someone passed two datasets to the benchmark script. Can you make sure you have the centroids file.

As for the wine dataset you should find a wine_centroids file that contains the initial centroid used for the kmeans benchmark script. Here is the part from the config file:

    KMEANS:
        run: ['metric']
        iteration: 3
        script: methods/mlpack/kmeans.py
        format: [csv, txt]
        datasets:
            - files: [ ['datasets/wine.csv', 'datasets/wine_centroids.csv'],
                       ['datasets/iris.csv', 'datasets/iris_centroids.csv'] ]
              options: '-c 3'

We specify that we like to run the kmeans benchmark script on the wine and the irsi dataset and we also specify for each dataset another file that contains the centroids.

sam0410 commented 7 years ago

Hi @zoq ! Yes, both wine_centroids as well as iris_centroids is present.

zoq commented 7 years ago

Can you post the output of cmd after:

  def RunMetrics(self, options):
    Log.Info("Perform K-Means Clustering.", self.verbose)

    # If the dataset contains two files then the second file is the centroids
    # file.
    if len(self.dataset) == 2:
      cmd = shlex.split(self.path + "mlpack_kmeans -i " + self.dataset[0] +
          " -I " + self.dataset[1] + " -o output.csv -v " + options)
    else:
      cmd = shlex.split(self.path + "mlpack_kmeans -i " + self.dataset[0] +
          " -o output.csv -v " + options)

    print(cmd)