usc-isi-i2 / dsbox-ta2

The DSBox TA2 component
MIT License
11 stars 6 forks source link

cleaning featurizer fails as of 3:45 pm friday #67

Closed serbanstan closed 6 years ago

serbanstan commented 6 years ago
(dsbox-devel-710) [stan@dsbox01 python]$ python ta2-search /nas/home/stan/dsbox/runs2/config-seed/32_wikiqa_config.json
Namespace(configuration_file='/nas/home/stan/dsbox/runs2/config-seed/32_wikiqa_config.json', cpus=-1, debug=False, output_prefix=None, timeout=-1)
Using configuation:
{'cpus': '10',
 'dataset_schema': '/nfs1/dsbox-repo/data/datasets/seed_datasets_current/32_wikiqa/32_wikiqa_dataset/datasetDoc.json',
 'executables_root': '/nfs1/dsbox-repo/stan/dsbox-ta2/python/output/32_wikiqa/executables',
 'pipeline_logs_root': '/nfs1/dsbox-repo/stan/dsbox-ta2/python/output/32_wikiqa/logs',
 'problem_root': '/nfs1/dsbox-repo/data/datasets/seed_datasets_current/32_wikiqa/32_wikiqa_problem',
 'problem_schema': '/nfs1/dsbox-repo/data/datasets/seed_datasets_current/32_wikiqa/32_wikiqa_problem/problemDoc.json',
 'ram': '10Gi',
 'saved_pipeline_ID': '',
 'saving_folder_loc': '/nfs1/dsbox-repo/stan/dsbox-ta2/python/output/32_wikiqa',
 'temp_storage_root': '/nfs1/dsbox-repo/stan/dsbox-ta2/python/output/32_wikiqa/temp',
 'timeout': 9,
 'training_data_root': '/nfs1/dsbox-repo/data/datasets/seed_datasets_current/32_wikiqa/32_wikiqa_dataset'}
[INFO] No test data config found! Will split the data.
[INFO] Succesfully parsed test data
{'structural_type': <class 'd3m.container.pandas.DataFrame'>, 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Table', 'https://metadata.datadrivendiscovery.org/types/DatasetEntryPoint'), 'dimension': {'name': 'rows', 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/TabularRow',), 'length': 23406}}
{'dimension': <FrozenOrderedDict OrderedDict([('name', 'rows'), ('semantic_types', ('https://metadata.datadrivendiscovery.org/types/TabularRow',)), ('length', 23406)])>,
 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Table',
                    'https://metadata.datadrivendiscovery.org/types/DatasetEntryPoint'),
 'structural_type': <class 'd3m.container.pandas.DataFrame'>}
{'structural_type': <class 'd3m.container.pandas.DataFrame'>, 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Table', 'https://metadata.datadrivendiscovery.org/types/DatasetEntryPoint'), 'dimension': {'name': 'rows', 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/TabularRow',), 'length': 5852}}
{'dimension': <FrozenOrderedDict OrderedDict([('name', 'rows'), ('semantic_types', ('https://metadata.datadrivendiscovery.org/types/TabularRow',)), ('length', 5852)])>,
 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Table',
                    'https://metadata.datadrivendiscovery.org/types/DatasetEntryPoint'),
 'structural_type': <class 'd3m.container.pandas.DataFrame'>}
[INFO] Template choices:
Template ' Test_classification_template ' has been added to template base.
[INFO] Worker started, id: <_MainProcess(MainProcess, started)>
/nfs1/dsbox-repo/stan/miniconda/envs/dsbox-devel-710/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
[INFO] Push@cache: ('d3m.primitives.dsbox.Denormalize', -6032109579214261120)
/nfs1/dsbox-repo/stan/miniconda/envs/dsbox-devel-710/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
[INFO] Push@cache: ('d3m.primitives.datasets.DatasetToDataFrame', -6032109579214261120)
[INFO] Push@cache: ('d3m.primitives.data.ExtractColumnsBySemanticTypes', 4477771545665660386)
[INFO] Push@cache: ('d3m.primitives.data.ExtractColumnsBySemanticTypes', 6526837046805561388)
[INFO] Push@cache: ('d3m.primitives.dsbox.Profiler', -359012423603502655)
[INFO] Push@cache: ('d3m.primitives.dsbox.CleaningFeaturizer', -359012423603502655)
/nfs1/dsbox-repo/stan/miniconda/envs/dsbox-devel-710/lib/python3.6/re.py:212: FutureWarning: split() requires a non-empty pattern match.
  return _compile(pattern, flags).split(string, maxsplit)
Traceback (most recent call last):
  File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/template/search.py", line 383, in evaluate_pipeline
    evaluation_result = self._evaluate(configuration, cache)
  File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/template/search.py", line 400, in _evaluate
    fitted_pipeline.fit(cache=cache, inputs=[self.train_dataset])
  File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/pipeline/fitted_pipeline.py", line 92, in fit
    self.runtime.fit(**arguments)
  File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/template/runtime.py", line 193, in fit
    primitive_arguments
  File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/template/runtime.py", line 281, in _primitive_step_fit
    produce_result = model.produce(**produce_params)
  File "/nfs1/dsbox-repo/stan/dsbox-cleaning/dsbox/datapreprocessing/cleaner/cleaning_featurizer.py", line 200, in produce
    df = ps.perform(self._mapping.get("punctuation_columns"))
  File "/nfs1/dsbox-repo/stan/dsbox-cleaning/dsbox/datapreprocessing/cleaner/spliter.py", line 146, in perform
    + str(count)] = one
TypeError: unsupported operand type(s) for +: 'int' and 'str'
Traceback (most recent call last):
  File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/template/search.py", line 274, in setup_initial_candidate
    candidate.data.update(result)
TypeError: 'NoneType' object is not iterable
--------------------
[ERROR] Initial Pipeline failed, Trying a random pipeline ...
[INFO] Worker started, id: <_MainProcess(MainProcess, started)>
[INFO] Hit@cache: ('d3m.primitives.dsbox.Denormalize', -6032109579214261120)
[INFO] Hit@cache: ('d3m.primitives.datasets.DatasetToDataFrame', -6032109579214261120)
[INFO] Hit@cache: ('d3m.primitives.data.ExtractColumnsBySemanticTypes', 4477771545665660386)
[INFO] Hit@cache: ('d3m.primitives.data.ExtractColumnsBySemanticTypes', 6526837046805561388)
[INFO] Hit@cache: ('d3m.primitives.dsbox.Profiler', -359012423603502655)
[INFO] Push@cache: ('d3m.primitives.dsbox.CleaningFeaturizer', -359012423603502655)
Traceback (most recent call last):
  File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/template/search.py", line 383, in evaluate_pipeline
    evaluation_result = self._evaluate(configuration, cache)
  File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/template/search.py", line 400, in _evaluate
    fitted_pipeline.fit(cache=cache, inputs=[self.train_dataset])
  File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/pipeline/fitted_pipeline.py", line 92, in fit
    self.runtime.fit(**arguments)
  File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/template/runtime.py", line 193, in fit
    primitive_arguments
  File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/template/runtime.py", line 281, in _primitive_step_fit
    produce_result = model.produce(**produce_params)
  File "/nfs1/dsbox-repo/stan/dsbox-cleaning/dsbox/datapreprocessing/cleaner/cleaning_featurizer.py", line 200, in produce
    df = ps.perform(self._mapping.get("punctuation_columns"))
  File "/nfs1/dsbox-repo/stan/dsbox-cleaning/dsbox/datapreprocessing/cleaner/spliter.py", line 146, in perform
    + str(count)] = one
TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U21') dtype('<U21') dtype('<U21')
Traceback (most recent call last):
  File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/template/search.py", line 274, in setup_initial_candidate
    candidate.data.update(result)
TypeError: 'NoneType' object is not iterable
--------------------
[ERROR] Initial Pipeline failed, Trying a random pipeline ...
RqS commented 6 years ago

Fixed