usc-isi-i2 / dsbox-ta2

The DSBox TA2 component
MIT License
11 stars 6 forks source link

Encoder fails #130

Closed serbanstan closed 6 years ago

serbanstan commented 6 years ago

Out in /dsbox_efs/runs/seed-serban/1567_poker_hand/supporting_files/logs

Namespace(configuration_file='/dsbox_efs/config/seed-41/partition-31/1567_poker_hand/search_config.json', cpus=10, debug=False, output_prefix=None, timeout=55)
Using configuation:
{'cpus': 10,
 'dataset_schema': '/dsbox_efs/dataset/seed_datasets_current/1567_poker_hand/1567_poker_hand_dataset/datasetDoc.json',
 'executables_root': '/dsbox_efs/runs/seed/1567_poker_hand/executables',
 'pipeline_logs_root': '/dsbox_efs/runs/seed/1567_poker_hand/pipelines',
 'problem_root': '/dsbox_efs/dataset/seed_datasets_current/1567_poker_hand/1567_poker_hand_problem',
 'problem_schema': '/dsbox_efs/dataset/seed_datasets_current/1567_poker_hand/1567_poker_hand_problem/problemDoc.json',
 'temp_storage_root': '/dsbox_efs/runs/seed/1567_poker_hand/supporting_files',
 'timeout': 55,
 'training_data_root': '/dsbox_efs/dataset/seed_datasets_current/1567_poker_hand/1567_poker_hand_dataset',
 'user_problems_root': '/dsbox_efs/runs/seed/1567_poker_hand/user_problems'}
[INFO] No test data config found! Will split the data.
[INFO] - dsbox.controller.controller - Top level output directory: /dsbox_efs/runs/seed/1567_poker_hand
[INFO] Template choices:
Template ' SRI_Mean_Baseline_Template ' has been added to template base.
Template ' random_forest_classification_template ' has been added to template base.
Template ' extra_trees_classification_template ' has been added to template base.
Template ' gradient_boosting_classification_template ' has been added to template base.
Template ' svc_classification_template ' has been added to template base.
[INFO] - dsbox.controller.controller - [INFO] Template 0:SRI_Mean_Baseline_Template Selected. UCT:[None, None, None, None, None]
[INFO] Worker started, id: <_MainProcess(MainProcess, started)> , True
/usr/local/lib/python3.6/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
[INFO] Will use normal train-test mode ( n = 1 ) to choose best primitives.
[INFO] Push@cache: ('d3m.primitives.sri.baseline.MeanBaseline', -1063565656570294438)
[INFO] Testing finish.!!!
[INFO] Now in normal mode, will add extra train with train_dataset1
[INFO] Hit@cache: ('d3m.primitives.sri.baseline.MeanBaseline', -1063565656570294438)
[INFO] Now are training the pipeline with all dataset and saving the pipeline.
[INFO] Hit@cache: ('d3m.primitives.sri.baseline.MeanBaseline', -1063565656570294438)
[INFO] push@Candidate: (-6190225537173857670,7f228b37-b7c3-4bcb-b240-7e1d10a4848b)
[INFO] - dsbox.controller.controller - ******************
[INFO] Writing results
{'cross_validation_metrics': [],
 'fitted_pipeline': <dsbox.pipeline.fitted_pipeline.FittedPipeline object at 0x7f74da254ef0>,
 'test_metrics': [{'metric': 'f1Macro', 'value': 0.06677010812611192}],
 'total_runtime': 177.5976459980011,
 'training_metrics': [{'metric': 'f1Macro', 'value': 0.06677040919023966}]}
[INFO] - dsbox.controller.controller - {'fitted_pipeline': <dsbox.pipeline.fitted_pipeline.FittedPipeline object at 0x7f74da254ef0>, 'training_metrics': [{'metric': 'f1Macro', 'value': 0.06677040919023966}], 'cross_validation_metrics': [], 'test_metrics': [{'metric': 'f1Macro', 'value': 0.06677010812611192}], 'total_runtime': 177.5976459980011} 0.06677010812611192
[INFO] - dsbox.controller.controller - Training f1Macro = 0.06677040919023966
[INFO] - dsbox.controller.controller - Validation f1Macro = 0.06677010812611192
[INFO] - dsbox.controller.controller - ******************
[INFO] Saving training results in /dsbox_efs/runs/seed/1567_poker_hand.txt
[INFO] - dsbox.controller.controller - [INFO] report: 0.06677010812611192
[INFO] - dsbox.controller.controller - [INFO] UCT updated: [35.6163306217057, 126.0047969176524, 126.0047969176524, 126.0047969176524, 126.0047969176524]
[INFO] - dsbox.controller.controller - [INFO] cache size: 1, candidates: 1
[INFO] - dsbox.controller.controller - [INFO] Template 1:random_forest_classification_template Selected. UCT:[35.6163306217057, 126.0047969176524, 126.0047969176524, 126.0047969176524, 126.0047969176524]
[INFO] Worker started, id: <_MainProcess(MainProcess, started)> , True
[INFO] Will use cross validation( n = 10 ) to choose best primitives.
[INFO] Push@cache: ('d3m.primitives.dsbox.Denormalize', -1063565656570294438)
[INFO] Push@cache: ('d3m.primitives.datasets.DatasetToDataFrame', -1063565656570294438)
[INFO] Push@cache: ('d3m.primitives.data.ExtractColumnsBySemanticTypes', 1068570567539312471)
[INFO] Push@cache: ('d3m.primitives.data.ExtractColumnsBySemanticTypes', 2531558873126565540)
[INFO] Push@cache: ('d3m.primitives.dsbox.Profiler', 5989275104929606214)
[INFO] Push@cache: ('d3m.primitives.dsbox.CleaningFeaturizer', 5989275104929606214)
[INFO] Push@cache: ('d3m.primitives.dsbox.CorexText', 5989275104929606214)
[INFO] Push@cache: ('d3m.primitives.dsbox.Encoder', 5989275104929606214)
Traceback (most recent call last):
  File "/user_opt/dsbox/dsbox-ta2/python/dsbox/template/search.py", line 523, in evaluate_pipeline
    evaluation_result = self._evaluate(configuration, cache, dump2disk)
  File "/user_opt/dsbox/dsbox-ta2/python/dsbox/template/search.py", line 546, in _evaluate
    fitted_pipeline.fit(cache=cache, inputs=[self.train_dataset1])
  File "/user_opt/dsbox/dsbox-ta2/python/dsbox/pipeline/fitted_pipeline.py", line 94, in fit
    self.runtime.fit(**arguments)
  File "/user_opt/dsbox/dsbox-ta2/python/dsbox/template/runtime.py", line 195, in fit
    primitive_arguments
  File "/user_opt/dsbox/dsbox-ta2/python/dsbox/template/runtime.py", line 284, in _primitive_step_fit
    produce_result = model.produce(**produce_params)
  File "/user_opt/dsbox/src/dsbox-datacleaning/dsbox/datapreprocessing/cleaner/encoder.py", line 194, in produce
    self._input_data_copy = utils.remove_columns(self._input_data_copy, drop_indices, source='ISI DSBox Data Encoder')
  File "/user_opt/dsbox/src/common-primitives/common_primitives/utils.py", line 388, in remove_columns
    raise ValueError("Removing columns would have removed the last column.")
ValueError: Removing columns would have removed the last column.
[INFO] push@Candidate: (-7907540468409979755,None)
Traceback (most recent call last):
  File "/user_opt/dsbox/dsbox-ta2/python/dsbox/template/search.py", line 378, in setup_initial_candidate
    candidate.data.update(result)
TypeError: 'NoneType' object is not iterable
[ERROR] Initial Pipeline failed, Trying a random pipeline ...
kyao commented 6 years ago

Duplicate https://github.com/usc-isi-i2/dsbox-ta2/issues/109