Closed Iron-Stark closed 7 years ago
@rcurtin
Yes I see your point. There are certain configurations of LSH available in sklearn and not available in say shogun. So introducing those parameters are not useful as benchmarks for them cannot be generated as these options are not available in the other library.
Also I agree that the config.yaml file can be restructured the way you have suggested so that the parameters can be abstracted to some other format.
@Iron-Stark: would you like to implement the new config file support? If not let's just open an issue for it---I don't have any time to handle that myself now unfortunately.
@rcurtin Yes I would like to implement that but I was thinking that it would be better to implement the algorithms in the various libraries first and then updating the config.yaml file.
Sounds good, I agree---we can handle the config file format somewhere else.
I think this is ready to merge; if everyone agrees then we can go ahead and do that.
I agree just a minor issue; if you could go through the changed files and truncate the lines to no more than 80 characters it would be great (e.g. put the comment above the parameter).
Thanks for going through the files, much easier to read.
Hi @Iron-Stark, thanks for the PR. I think that we may want to consider other ways of providing all of these options to the config.yaml file. Right now we might write something like
but if the goal is to allow the user to specify any setting in the config.yaml file, we may want to restructure the input to look more like
otherwise I think we will run out of option names. :) So instead of abstracting to parameters specified similarly to mlpack, we would end up abstracting to some other format. (I am not sure my idea is the best, so please feel free to disagree or improve upon it.)
A thing to keep in mind is that in order to produce benchmarks that can actually be useful for someone looking at it, we have to reduce the scope a little bit sometimes---instead of, e.g., benchmarking every possible configuration of LSH, we instead benchmark "the default configuration of each library's LSH configuration given that we can tune the number of tables and projections"; or, e.g., we benchmark "each k-means algorithm that is implemented for each library with k and the initial centroids set specifically". But if the scope is too large (e.g. "benchmark every k-NN algorithm") then there get to be so many different parameter combinations that it's impossible to reasonably do that.
So I don't think it's a problem necessarily to provide this many options to be accessed through the configuration (other than the running out of option names issue), but I do think we should keep in mind that we may not want to use all of these options in the
config.yaml
that we run.