Closed yanxon closed 6 years ago
@yanxon Please follow the way as described in the README.md.
Ideally, the yaml file should give the default parameters for each algorithm, grid search options.
For the code in method.py, I suggest you include all steps,
This way, we can call the ML method more conveniently without going to the details of each algorithm.
@qzhu2017
The GridSearchCV automatically does cross validation + fit for us.
Please review method.py, I will add more ml algorithm once the structure is ok. Unless, I have to change a lot if the structure is not optimized.
Here is the list of I change:
The params in yaml file is for 'tight' only. Also, I defined 'cv' for K-fold cross validation.
I don't understand "process the features." Can you please explain?
We can use sklearn.model_selection.RandomizedSearchCV if GridSearchCV takes very loooong.
Howard
@yanxon I am not sure if grid search does the cross validation for us. By cross validation, I mean to split the data to train set and test set many times. As such, we ensure that the model does not rely on the split of data.
@yanxon also, please rename the file to .yaml, instead of yml
The grid search does K-Fold cross validation for us. Thus, the name is GridSearchCV.
For example, you have this parameters n_estimators = [1,2,3,4,5], leaf_size = [1,2,3,4,5], cv = 10. GridSearchCV actually does 5 x 5 x 10 = 250 calculations.
Please check out https://www.youtube.com/watch?v=Gol_qOgRqfA
I changed to .yaml.
For the current stage, the yaml file looks good.
However, I am not quite sure about the use of cv in GridsearchCV function.
If we use cv=10, it will explore the calculations for 10 times. Do you have the output of r^2 or MAE for each calculation. I suggest we don't just select only the best results from 10 calculations. We also need to provide some information about the variation of these r2/mae values. They can tell us if we can trust these ML models constructed by the medium set.
Sounds good. I will implement that feature.
I'm sure it output the r2 results for each CV calculation.
I would like to improve the readability in yaml file