Closed kyao closed 6 years ago
Commit https://github.com/usc-isi-i2/dsbox-ta2/commit/306892e7d530e5b2da6bc3026e85492780ee9c73 fails to save for some datasets.
python ta2-search ~kyao/dsbox/runs3/config-seed/38_sick_config.json
(d3m-devel) ➜ python git:(template-2018-june) ✗ python ta2-search /Users/minazuki/Desktop/studies/master/2018Summer/data/config/38_config.json Namespace(configuration_file='/Users/minazuki/Desktop/studies/master/2018Summer/data/config/38_config.json', cpus=-1, debug=False, output_prefix=None, timeout=-1) Using configuation: {'cpus': '10', 'dataset_schema': '/Users/minazuki/Desktop/studies/master/2018Summer/data/datasets/seed_datasets_current/38_sick/38_sick_dataset/datasetDoc.json', 'executables_root': '/Users/minazuki/Desktop/studies/master/2018Summer/data/executables', 'pipeline_logs_root': '/Users/minazuki/Desktop/studies/master/2018Summer/data/logs', 'problem_root': '/Users/minazuki/Desktop/studies/master/2018Summer/data/datasets/seed_datasets_current/38_sick/TRAIN/problem_TRAIN', 'problem_schema': '/Users/minazuki/Desktop/studies/master/2018Summer/data/datasets/seed_datasets_current/38_sick/TRAIN/problem_TRAIN/problemDoc.json', 'ram': '10Gi', 'saved_pipeline_ID': '', 'saving_folder_loc': '/Users/minazuki/Desktop/studies/master/2018Summer/data/outputs', 'temp_storage_root': '/Users/minazuki/Desktop/studies/master/2018Summer/data/datasets/seed_datasets_current/38_sick_new/temp', 'timeout': 48, 'train_data_schema': '/Users/minazuki/Desktop/studies/master/2018Summer/data/datasets/seed_datasets_current/38_sick/TRAIN/dataset_TRAIN/datasetDoc.json'} [INFO] No test data config found! Will split the data. [INFO] Failed test data parse/ using stratified kfold data instead {'structural_type': <class 'd3m.container.pandas.DataFrame'>, 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Table', 'https://metadata.datadrivendiscovery.org/types/DatasetEntryPoint'), 'dimension': {'name': 'rows', 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/TabularRow',), 'length': 3017}} {'dimension': <FrozenOrderedDict OrderedDict([('name', 'rows'), ('semantic_types', ('https://metadata.datadrivendiscovery.org/types/TabularRow',)), ('length', 3017)])>, 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Table', 'https://metadata.datadrivendiscovery.org/types/DatasetEntryPoint'), 'structural_type': <class 'd3m.container.pandas.DataFrame'>} {'structural_type': <class 'd3m.container.pandas.DataFrame'>, 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Table', 'https://metadata.datadrivendiscovery.org/types/DatasetEntryPoint'), 'dimension': {'name': 'rows', 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/TabularRow',), 'length': 755}} {'dimension': <FrozenOrderedDict OrderedDict([('name', 'rows'), ('semantic_types', ('https://metadata.datadrivendiscovery.org/types/TabularRow',)), ('length', 755)])>, 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Table', 'https://metadata.datadrivendiscovery.org/types/DatasetEntryPoint'), 'structural_type': <class 'd3m.container.pandas.DataFrame'>} [INFO] Template choices: Template ' Default_classification_template ' has been added to template base. [INFO] Worker started, id: <_MainProcess(MainProcess, started)> Using TensorFlow backend. 2018-07-05 15:51:08.938234: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA **************************************************************************************************** [INFO] Running Pool: 3 [INFO] Worker started, id: <ForkProcess(ForkPoolWorker-1, started daemon)> [INFO] Worker started, id: <ForkProcess(ForkPoolWorker-2, started daemon)> [INFO] Worker started, id: <ForkProcess(ForkPoolWorker-3, started daemon)> /Users/minazuki/miniconda3/envs/d3m-devel/lib/python3.6/site-packages/sklearn/metrics/classification.py:1135: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples. 'precision', 'predicted', average, warn_for) /Users/minazuki/miniconda3/envs/d3m-devel/lib/python3.6/site-packages/sklearn/metrics/classification.py:1135: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples. 'precision', 'predicted', average, warn_for) /Users/minazuki/miniconda3/envs/d3m-devel/lib/python3.6/site-packages/sklearn/metrics/classification.py:1135: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples. 'precision', 'predicted', average, warn_for) [INFO] Best index: 0 [0.4842896174863388, 0.4842896174863388, 0.4842896174863388] ****************** [INFO] Writing results {'exec_plan': [0, 1, 6, 2, 3, 4, 5, 7], 'fitted_pipe': [d3m.primitives.datasets.Denormalize(hyperparams=Hyperparams({'starting_resource': None, 'recursive': True, 'many_to_many': True}), random_seed=0), d3m.primitives.datasets.DatasetToDataFrame(hyperparams=Hyperparams({'dataframe_resource': None}), random_seed=0), d3m.primitives.data.ExtractColumnsBySemanticTypes(hyperparams=Hyperparams({'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Attribute',), 'use_columns': (), 'exclude_columns': ()}), random_seed=0), d3m.primitives.data.ColumnParser(hyperparams=Hyperparams({'parse_semantic_types': ('http://schema.org/Boolean', 'https://metadata.datadrivendiscovery.org/types/CategoricalData', 'http://schema.org/Integer', 'http://schema.org/Float', 'http://schema.org/Time'), 'use_columns': (), 'exclude_columns': (), 'return_result': 'replace', 'add_index_columns': True}), random_seed=0), d3m.primitives.data.CastToType(hyperparams=Hyperparams({'type_to_cast': 'str', 'use_columns': (), 'exclude_columns': ()}), random_seed=0), d3m.primitives.sklearn_wrap.SKImputer(hyperparams=Hyperparams({'missing_values': 'NaN', 'strategy': 'mean', 'axis': 0, 'copy': True, 'use_columns': (), 'exclude_columns': (), 'return_result': 'replace', 'use_semantic_types': False}), random_seed=0), d3m.primitives.data.ExtractColumnsBySemanticTypes(hyperparams=Hyperparams({'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Target', 'https://metadata.datadrivendiscovery.org/types/SuggestedTarget'), 'use_columns': (), 'exclude_columns': ()}), random_seed=0), d3m.primitives.sklearn_wrap.SKRandomForestClassifier(hyperparams=Hyperparams({'n_estimators': 10, 'criterion': 'gini', 'max_features': 'auto', 'max_depth': None, 'min_samples_split': 2, 'min_samples_leaf': 1, 'min_weight_fraction_leaf': 0, 'max_leaf_nodes': None, 'min_impurity_split': None, 'bootstrap': True, 'oob_score': False, 'n_jobs': 1, 'warm_start': False, 'class_weight': None, 'use_columns': (), 'exclude_columns': (), 'return_result': 'replace', 'use_semantic_types': False}), random_seed=0)], 'training_metrics': [{'metric': 'f1Macro', 'value': 0.8618945338392843}], 'validation_metrics': [{'metric': 'f1Macro', 'value': 0.5058972382600602}]} {'denormalize_step': {'primitive': 'd3m.primitives.datasets.Denormalize', 'hyperparameters': {}}, 'to_dataframe_step': {'primitive': 'd3m.primitives.datasets.DatasetToDataFrame', 'hyperparameters': {}}, 'extract_attribute_step': {'primitive': 'd3m.primitives.data.ExtractColumnsBySemanticTypes', 'hyperparameters': {'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Attribute',)}}, 'column_parser_step': {'primitive': 'd3m.primitives.data.ColumnParser', 'hyperparameters': {}}, 'cast_1_step': {'primitive': 'd3m.primitives.data.CastToType', 'hyperparameters': {}}, 'impute_step': {'primitive': 'd3m.primitives.sklearn_wrap.SKImputer', 'hyperparameters': {}}, 'extract_target_step': {'primitive': 'd3m.primitives.data.ExtractColumnsBySemanticTypes', 'hyperparameters': {'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Target', 'https://metadata.datadrivendiscovery.org/types/SuggestedTarget')}}, 'model_step': {'primitive': 'd3m.primitives.sklearn_wrap.SKRandomForestClassifier', 'hyperparameters': {'n_estimators': 10}}} 0.5058972382600602 Training f1Macro = 0.8618945338392843 Validation f1Macro = 0.5058972382600602 ****************** [INFO] Saving training results in /Users/minazuki/Desktop/studies/master/2018Summer/data/outputsdata.txt ****************** [INFO] Saving Best Pipeline Traceback (most recent call last): File "/Users/minazuki/Desktop/studies/master/2018Summer/DSBOX_new/dsbox-ta2/python/dsbox/controller/controller.py", line 343, in train dataset=self.dataset) File "/Users/minazuki/Desktop/studies/master/2018Summer/DSBOX_new/dsbox-ta2/python/dsbox/pipeline/fitted_pipeline.py", line 61, in create pipeline = configuration.data['pipeline'] KeyError: 'pipeline' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "ta2-search", line 133, in <module> result = main(args) File "ta2-search", line 102, in main status = controller.train() File "/Users/minazuki/Desktop/studies/master/2018Summer/DSBOX_new/dsbox-ta2/python/dsbox/controller/controller.py", line 347, in train '[ERROR] Save Failed!') d3m.exceptions.NotSupportedError: [ERROR] Save Failed!
Fix with the latest commit
Commit https://github.com/usc-isi-i2/dsbox-ta2/commit/306892e7d530e5b2da6bc3026e85492780ee9c73 fails to save for some datasets.