mlpack / benchmarks

Machine Learning Benchmark Scripts
101 stars 49 forks source link

Expanding the benchmark coverage of this repository for all the toolkits #143

Open braceletboy opened 4 years ago

braceletboy commented 4 years ago

@zoq @rcurtin I felt that this is a fantastic project where people can find which ml-toolkits are better for certain algorithms, and where the toolkits can improve themselves. So, I have been doing some work on my own that might be useful for this project. I have made a google sheet of the data I have been collecting in this regard. This google sheet contains:

I have till now covered all the algorithms provided by scikit-learn, mlpack and I am in the process of adding all the algorithms provided by Shogun into this list. This is a work in progress. I am going to add more algorithms to this list in the coming future and hopefully complete this. This is the google sheet that I am preparing:

image

image

image

image

I this regards I have some questions: a) Is the aim of this project limited to benchmarking the algorithms supported mlpack? If no, I feel that having a sheet like this one, would help. (I got the idea of consolidating all this in a google sheet after I saw a google sheet on tensorflow's github when they were making tensorflow 2.0 and had to list all the API classes that needed some specific change). b) Also, would it be possible for contributors from mlpack to also contribute to this sheet? I can give edit access. Currently, there are around 166 algorithms that are already listed with many more algorithms not covered and I haven't yet gone through all the library APIs. Would appreciate the help :)

rcurtin commented 4 years ago

a) Is the aim of this project limited to benchmarking the algorithms supported mlpack? If no, I feel that having a sheet like this one, would help. (I got the idea of consolidating all this in a google sheet after I saw a google sheet on tensorflow's github when they were making tensorflow 2.0 and had to list all the API classes that needed some specific change).

When we originally started on this project (it was @zoq's GSoC project many years ago :)) the idea was to use this benchmarking system to compare mlpack's implementations against other implementations. But it's grown somewhat since then, and honestly, it's a pretty general-purpose benchmarking system, so I don't see any need to limit only to algorithms that mlpack supports.

b) Also, would it be possible for contributors from mlpack to also contribute to this sheet? I can give edit access. Currently, there are around 166 algorithms that are already listed with many more algorithms not covered and I haven't yet gone through all the library APIs. Would appreciate the help :)

Sure, I would imagine that there would be some interest. You might try posting it on the mlpack chat channel (IRC/Matrix/gitter/etc.): https://www.mlpack.org/community.html#real-time-chat

zoq commented 4 years ago

a) Is the aim of this project limited to benchmarking the algorithms supported mlpack? If no, I feel that having a sheet like this one, would help. (I got the idea of consolidating all this in a google sheet after I saw a google sheet on tensorflow's github when they were making tensorflow 2.0 and had to list all the API classes that needed some specific change).

Awesome, thanks for putting everything together.

b) Also, would it be possible for contributors from mlpack to also contribute to this sheet? I can give edit access. Currently, there are around 166 algorithms that are already listed with many more algorithms not covered and I haven't yet gone through all the library APIs. Would appreciate the help :)

Happy to help, just send you a request.

braceletboy commented 4 years ago

@zoq and @rcurtin Sorry for the late response. I was on an extended vacation. I have approved your request @zoq. Please have a look at it and let me know what you think about it. Also let me know if you have any questions.