Open scarlett2018 opened 4 years ago
Thanks for the pointer! Just had a brief look and it seems very focused on NAS. It does mention scikit-learn support, how suitable do you think the framework is for the tabular and structured data we feature in our benchmark? It seems some configuration is required (search space, writing a run_trial
function and a config
yaml file), are there ready-to-use presets available?
Thanks for the pointer! Just had a brief look and it seems very focused on NAS. It does mention scikit-learn support, how suitable do you think the framework is for the tabular and structured data we feature in our benchmark? It seems some configuration is required (search space, writing a
run_trial
function and aconfig
yaml file), are there ready-to-use presets available?
Yes, you are right. Search space and configuration files are required, no ready-to-use presets. In stead of using the default settings, I'm thinking maybe the automlbenchmark will need to support hyperparameter tuning later. NNI might be a good fit by then.
Okay, thanks :) In that case I'll just note down that we should document the framework somewhere, so that we can refer to it later if/when we do wish to expand our scope.
Okay, thanks :) In that case I'll just note down that we should document the framework somewhere, so that we can refer to it later if/when we do wish to expand our scope.
sounds good, thanks @PGijsbers .
@PGijsbers is this something that needs work on? I have spent some time working with NNI and happy to do a pull request with the datasets mentioned. Not done them yet, but can do it over the upcoming holidays?
Thanks for the offer @setuc! Unfortunately it is not quite clear to me what you propose here, is it any of these:
Create some out-of-the-box functionality for NNI so that it will search through solutions for tabular structured data? I would propose this at the NNI repository. If this functionality is wanted there and integrated, we could add NNI to the AutoML benchmark.
Set up NNI to work as an AutoML framework within our benchmark, by developing and configuring the required search space, configuration etc? I don't think we are interested in having that at this moment. We want to evaluate AutoML systems as they are out of the box. Requiring the specification of a search space and configuration is quite technical and actually a big part of the development of an AutoML system. We think pinning down any one specific search space/strategy would not reflect the performance of NNI, so doing that on the AutoML benchmark side is wrong, I think.
Documenting the framework, its intention and/or strengths and weaknesses for later reference? That would be a great contribution to have :) If you have spent some time with the framework, it would be very helpful to write a brief run-down and an example of how it could be used for solving the type of datasets we currently use in the benchmark in particular.
AutoML toolkit for hyperparameter tuning, NAS and model compression: https://github.com/Microsoft/nni