usc-isi-i2 / dsbox-ta2

The DSBox TA2 component
MIT License
11 stars 6 forks source link

Pipeline Save Fails for 38_sick dataset #40

Closed kyao closed 6 years ago

kyao commented 6 years ago

Commit https://github.com/usc-isi-i2/dsbox-ta2/commit/306892e7d530e5b2da6bc3026e85492780ee9c73 fails to save for some datasets.

python  ta2-search ~kyao/dsbox/runs3/config-seed/38_sick_config.json
(d3m-devel) ➜  python git:(template-2018-june) ✗ python ta2-search /Users/minazuki/Desktop/studies/master/2018Summer/data/config/38_config.json       
Namespace(configuration_file='/Users/minazuki/Desktop/studies/master/2018Summer/data/config/38_config.json', cpus=-1, debug=False, output_prefix=None, timeout=-1)
Using configuation:
{'cpus': '10',
 'dataset_schema': '/Users/minazuki/Desktop/studies/master/2018Summer/data/datasets/seed_datasets_current/38_sick/38_sick_dataset/datasetDoc.json',
 'executables_root': '/Users/minazuki/Desktop/studies/master/2018Summer/data/executables',
 'pipeline_logs_root': '/Users/minazuki/Desktop/studies/master/2018Summer/data/logs',
 'problem_root': '/Users/minazuki/Desktop/studies/master/2018Summer/data/datasets/seed_datasets_current/38_sick/TRAIN/problem_TRAIN',
 'problem_schema': '/Users/minazuki/Desktop/studies/master/2018Summer/data/datasets/seed_datasets_current/38_sick/TRAIN/problem_TRAIN/problemDoc.json',
 'ram': '10Gi',
 'saved_pipeline_ID': '',
 'saving_folder_loc': '/Users/minazuki/Desktop/studies/master/2018Summer/data/outputs',
 'temp_storage_root': '/Users/minazuki/Desktop/studies/master/2018Summer/data/datasets/seed_datasets_current/38_sick_new/temp',
 'timeout': 48,
 'train_data_schema': '/Users/minazuki/Desktop/studies/master/2018Summer/data/datasets/seed_datasets_current/38_sick/TRAIN/dataset_TRAIN/datasetDoc.json'}
[INFO] No test data config found! Will split the data.
[INFO] Failed test data parse/ using stratified kfold data instead
{'structural_type': <class 'd3m.container.pandas.DataFrame'>, 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Table', 'https://metadata.datadrivendiscovery.org/types/DatasetEntryPoint'), 'dimension': {'name': 'rows', 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/TabularRow',), 'length': 3017}}
{'dimension': <FrozenOrderedDict OrderedDict([('name', 'rows'), ('semantic_types', ('https://metadata.datadrivendiscovery.org/types/TabularRow',)), ('length', 3017)])>,
 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Table',
                    'https://metadata.datadrivendiscovery.org/types/DatasetEntryPoint'),
 'structural_type': <class 'd3m.container.pandas.DataFrame'>}
{'structural_type': <class 'd3m.container.pandas.DataFrame'>, 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Table', 'https://metadata.datadrivendiscovery.org/types/DatasetEntryPoint'), 'dimension': {'name': 'rows', 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/TabularRow',), 'length': 755}}
{'dimension': <FrozenOrderedDict OrderedDict([('name', 'rows'), ('semantic_types', ('https://metadata.datadrivendiscovery.org/types/TabularRow',)), ('length', 755)])>,
 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Table',
                    'https://metadata.datadrivendiscovery.org/types/DatasetEntryPoint'),
 'structural_type': <class 'd3m.container.pandas.DataFrame'>}
[INFO] Template choices:
Template ' Default_classification_template ' has been added to template base.
[INFO] Worker started, id: <_MainProcess(MainProcess, started)>
Using TensorFlow backend.
2018-07-05 15:51:08.938234: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
****************************************************************************************************
[INFO] Running Pool: 3
[INFO] Worker started, id: <ForkProcess(ForkPoolWorker-1, started daemon)>
[INFO] Worker started, id: <ForkProcess(ForkPoolWorker-2, started daemon)>
[INFO] Worker started, id: <ForkProcess(ForkPoolWorker-3, started daemon)>
/Users/minazuki/miniconda3/envs/d3m-devel/lib/python3.6/site-packages/sklearn/metrics/classification.py:1135: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)
/Users/minazuki/miniconda3/envs/d3m-devel/lib/python3.6/site-packages/sklearn/metrics/classification.py:1135: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)
/Users/minazuki/miniconda3/envs/d3m-devel/lib/python3.6/site-packages/sklearn/metrics/classification.py:1135: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)
[INFO] Best index: 0
[0.4842896174863388, 0.4842896174863388, 0.4842896174863388]
******************
[INFO] Writing results
{'exec_plan': [0, 1, 6, 2, 3, 4, 5, 7], 'fitted_pipe': [d3m.primitives.datasets.Denormalize(hyperparams=Hyperparams({'starting_resource': None, 'recursive': True, 'many_to_many': True}), random_seed=0), d3m.primitives.datasets.DatasetToDataFrame(hyperparams=Hyperparams({'dataframe_resource': None}), random_seed=0), d3m.primitives.data.ExtractColumnsBySemanticTypes(hyperparams=Hyperparams({'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Attribute',), 'use_columns': (), 'exclude_columns': ()}), random_seed=0), d3m.primitives.data.ColumnParser(hyperparams=Hyperparams({'parse_semantic_types': ('http://schema.org/Boolean', 'https://metadata.datadrivendiscovery.org/types/CategoricalData', 'http://schema.org/Integer', 'http://schema.org/Float', 'http://schema.org/Time'), 'use_columns': (), 'exclude_columns': (), 'return_result': 'replace', 'add_index_columns': True}), random_seed=0), d3m.primitives.data.CastToType(hyperparams=Hyperparams({'type_to_cast': 'str', 'use_columns': (), 'exclude_columns': ()}), random_seed=0), d3m.primitives.sklearn_wrap.SKImputer(hyperparams=Hyperparams({'missing_values': 'NaN', 'strategy': 'mean', 'axis': 0, 'copy': True, 'use_columns': (), 'exclude_columns': (), 'return_result': 'replace', 'use_semantic_types': False}), random_seed=0), d3m.primitives.data.ExtractColumnsBySemanticTypes(hyperparams=Hyperparams({'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Target', 'https://metadata.datadrivendiscovery.org/types/SuggestedTarget'), 'use_columns': (), 'exclude_columns': ()}), random_seed=0), d3m.primitives.sklearn_wrap.SKRandomForestClassifier(hyperparams=Hyperparams({'n_estimators': 10, 'criterion': 'gini', 'max_features': 'auto', 'max_depth': None, 'min_samples_split': 2, 'min_samples_leaf': 1, 'min_weight_fraction_leaf': 0, 'max_leaf_nodes': None, 'min_impurity_split': None, 'bootstrap': True, 'oob_score': False, 'n_jobs': 1, 'warm_start': False, 'class_weight': None, 'use_columns': (), 'exclude_columns': (), 'return_result': 'replace', 'use_semantic_types': False}), random_seed=0)], 'training_metrics': [{'metric': 'f1Macro', 'value': 0.8618945338392843}], 'validation_metrics': [{'metric': 'f1Macro', 'value': 0.5058972382600602}]}
{'denormalize_step': {'primitive': 'd3m.primitives.datasets.Denormalize', 'hyperparameters': {}}, 'to_dataframe_step': {'primitive': 'd3m.primitives.datasets.DatasetToDataFrame', 'hyperparameters': {}}, 'extract_attribute_step': {'primitive': 'd3m.primitives.data.ExtractColumnsBySemanticTypes', 'hyperparameters': {'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Attribute',)}}, 'column_parser_step': {'primitive': 'd3m.primitives.data.ColumnParser', 'hyperparameters': {}}, 'cast_1_step': {'primitive': 'd3m.primitives.data.CastToType', 'hyperparameters': {}}, 'impute_step': {'primitive': 'd3m.primitives.sklearn_wrap.SKImputer', 'hyperparameters': {}}, 'extract_target_step': {'primitive': 'd3m.primitives.data.ExtractColumnsBySemanticTypes', 'hyperparameters': {'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Target', 'https://metadata.datadrivendiscovery.org/types/SuggestedTarget')}}, 'model_step': {'primitive': 'd3m.primitives.sklearn_wrap.SKRandomForestClassifier', 'hyperparameters': {'n_estimators': 10}}} 0.5058972382600602
Training f1Macro = 0.8618945338392843
Validation f1Macro = 0.5058972382600602
******************
[INFO] Saving training results in /Users/minazuki/Desktop/studies/master/2018Summer/data/outputsdata.txt
******************
[INFO] Saving Best Pipeline
Traceback (most recent call last):
  File "/Users/minazuki/Desktop/studies/master/2018Summer/DSBOX_new/dsbox-ta2/python/dsbox/controller/controller.py", line 343, in train
    dataset=self.dataset)
  File "/Users/minazuki/Desktop/studies/master/2018Summer/DSBOX_new/dsbox-ta2/python/dsbox/pipeline/fitted_pipeline.py", line 61, in create
    pipeline = configuration.data['pipeline']
KeyError: 'pipeline'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "ta2-search", line 133, in <module>
    result = main(args)
  File "ta2-search", line 102, in main
    status = controller.train()
  File "/Users/minazuki/Desktop/studies/master/2018Summer/DSBOX_new/dsbox-ta2/python/dsbox/controller/controller.py", line 347, in train
    '[ERROR] Save Failed!')
d3m.exceptions.NotSupportedError: [ERROR] Save Failed!
kyao commented 6 years ago

Fix with the latest commit