mlpack / benchmarks

Machine Learning Benchmark Scripts
101 stars 49 forks source link

Updation of scikit libraries #60

Closed Iron-Stark closed 7 years ago

rcurtin commented 7 years ago

Hi @Iron-Stark, thanks for the PR. I think that we may want to consider other ways of providing all of these options to the config.yaml file. Right now we might write something like

options: '-c 3'

but if the goal is to allow the user to specify any setting in the config.yaml file, we may want to restructure the input to look more like

options:
  clusters: 3
  other_option: "value"
  ...

otherwise I think we will run out of option names. :) So instead of abstracting to parameters specified similarly to mlpack, we would end up abstracting to some other format. (I am not sure my idea is the best, so please feel free to disagree or improve upon it.)

A thing to keep in mind is that in order to produce benchmarks that can actually be useful for someone looking at it, we have to reduce the scope a little bit sometimes---instead of, e.g., benchmarking every possible configuration of LSH, we instead benchmark "the default configuration of each library's LSH configuration given that we can tune the number of tables and projections"; or, e.g., we benchmark "each k-means algorithm that is implemented for each library with k and the initial centroids set specifically". But if the scope is too large (e.g. "benchmark every k-NN algorithm") then there get to be so many different parameter combinations that it's impossible to reasonably do that.

So I don't think it's a problem necessarily to provide this many options to be accessed through the configuration (other than the running out of option names issue), but I do think we should keep in mind that we may not want to use all of these options in the config.yaml that we run.

Iron-Stark commented 7 years ago

@rcurtin

Yes I see your point. There are certain configurations of LSH available in sklearn and not available in say shogun. So introducing those parameters are not useful as benchmarks for them cannot be generated as these options are not available in the other library.

Also I agree that the config.yaml file can be restructured the way you have suggested so that the parameters can be abstracted to some other format.

rcurtin commented 7 years ago

@Iron-Stark: would you like to implement the new config file support? If not let's just open an issue for it---I don't have any time to handle that myself now unfortunately.

Iron-Stark commented 7 years ago

@rcurtin Yes I would like to implement that but I was thinking that it would be better to implement the algorithms in the various libraries first and then updating the config.yaml file.

rcurtin commented 7 years ago

Sounds good, I agree---we can handle the config file format somewhere else.

rcurtin commented 7 years ago

I think this is ready to merge; if everyone agrees then we can go ahead and do that.

zoq commented 7 years ago

I agree just a minor issue; if you could go through the changed files and truncate the lines to no more than 80 characters it would be great (e.g. put the comment above the parameter).

zoq commented 7 years ago

Thanks for going through the files, much easier to read.