udellgroup / oboe

An AutoML pipeline selection system to quickly select a promising pipeline for a new dataset.
BSD 3-Clause "New" or "Revised" License
82 stars 17 forks source link

the shape of ERROR_TENSOR #13

Closed zml24 closed 4 years ago

zml24 commented 4 years ago

Hi, I read the file error_tensor.npy and find that the shape of ERROR_TENSOR is (215, 4, 2, 8, 183).

After computing, I find the number of standardizer is 2; the number of dim_reducer is 8; the number of estimator is 183; maybe the number of dataset is 215. So, what is the number in the shape 4 means?

Here are infos in classification.json

{
"imputer":
{"algorithms": ["SimpleImputer"],
"hyperparameters": {
    "SimpleImputer": {"strategy": ["mean", "median", "most_frequent", "constant"]}
}},
 "encoder": 
 {"algorithms": [null, "OneHotEncoder"],
 "hyperparameters": {
     "OneHotEncoder": {"handle_unknown": ["ignore"], "sparse": [0]}    
 }},
  "standardizer":
  {"algorithms": [null, "StandardScaler"],
  "hyperparameters": {
  "StandardScaler": {}
 }},

    "dim_reducer":
    {"algorithms": [null, "PCA", "VarianceThreshold", "SelectKBest"],
    "hyperparameters": {
    "PCA": {"n_components": ["25%", "50%", "75%"]},
    "VarianceThreshold": {},
    "SelectKBest": {"k": ["25%", "50%", "75%"]}
 }}, 
"estimator":
{"algorithms": ["KNN", "DT", "RF", "GBT", "AB", "lSVM", "Logit", "Perceptron", "GNB", "MLP", "ExtraTrees"], 
 "hyperparameters": {
     "KNN": {"n_neighbors": [1, 3, 5, 7, 9, 11, 13, 15], "p": [1, 2]}, 
     "DT": {"min_samples_split": [2,4,8,16,32,64,128,256,512,1024,0.01,0.001,0.0001,1e-05]}, 
     "RF": {"min_samples_split": [2,4,8,16,32,64,128,256,512,1024,0.1,0.01,0.001,0.0001,1e-05], "criterion": ["gini", "entropy"]}, 
     "GBT": {"learning_rate": [0.001,0.01,0.025,0.05,0.1,0.25,0.5], "max_depth": [3, 6], "max_features": [null, "log2"]}, 
     "AB": {"n_estimators": [50, 100], "learning_rate": [1.0, 1.5, 2.0, 2.5, 3.0]}, 
     "lSVM": {"C": [0.125,0.25,0.5,0.75,1,2,4,8,16]},
     "Logit": {"C": [0.25,0.5,0.75,1,1.5,2,3,4], "solver": ["liblinear", "saga"], "penalty": ["l1", "l2"]}, 
     "Perceptron": {}, 
     "GNB": {}, 
     "MLP": {"learning_rate_init": [0.0001,0.001,0.01], "learning_rate": ["adaptive"], "solver": ["sgd", "adam"], "alpha": [0.0001, 0.01]}, 
     "ExtraTrees": {"min_samples_split": [2,4,8,16,32,64,128,256,512,1024,0.1,0.01,0.001,0.0001,1e-05], "criterion": ["gini", "entropy"]}
 }}
}
chengrunyang commented 4 years ago

Hi, sorry for the confusion! This was an earlier version that did not include the encoder dimension. I will push a newer version in which the error tensor has size (n_datasets, 4, 2, 2, 8, 183), which includes 4 data imputers, 2 encoders, 2 standardizers, 8 dimensionality reducers and 183 estimators.

zml24 commented 4 years ago

Thanks!