szilard / benchm-ml

A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
MIT License
1.87k stars 334 forks source link

benchmarking with autosklearn (zeroconf) #50

Open Motorrat opened 7 years ago

Motorrat commented 7 years ago

Great initiative, thanks for making this public! You might be interested in extending your benchmarking to the auto-sklearn. https://github.com/automl/auto-sklearn I have created a script that can take in a sparse dataset in the pandas HDFS dataframe .h5 format and run a binary classification on it on multiprocessing cluster with auto-sklearn. https://github.com/Motorrat/autosklearn-zeroconf Myself I will try to duplicate your benchmark, but just in case you are on it you might want to try out yourself.

szilard commented 7 years ago

Thanks @Motorrat for feedback/comments/info. I probably won't have time to extend the benchmark, but contributions are welcome :)