Running python ta2-search /nas/home/stan/dsbox/runs2/config-ll0/LL0_690_visualizing_galaxy_config.json breaks the profiler. The chosen template is default_regression_template.
(dsbox-devel-710) [stan@dsbox01 python]$ python ta2-search /nas/home/stan/dsbox/runs2/config-ll0/LL0_690_visualizing_galaxy_config.json
Namespace(configuration_file='/nas/home/stan/dsbox/runs2/config-ll0/LL0_690_visualizing_galaxy_config.json', cpus=-1, debug=False, output_prefix=None, timeout=-1)
Using configuation:
{'cpus': '10',
'dataset_schema': '/nfs1/dsbox-repo/data/datasets/training_datasets/LL0/LL0_690_visualizing_galaxy/LL0_690_visualizing_galaxy_dataset/datasetDoc.json',
'executables_root': '/nfs1/dsbox-repo/stan/dsbox-ta2/python/output/LL0_690_visualizing_galaxy/executables',
'pipeline_logs_root': '/nfs1/dsbox-repo/stan/dsbox-ta2/python/output/LL0_690_visualizing_galaxy/logs',
'problem_root': '/nfs1/dsbox-repo/data/datasets/training_datasets/LL0/LL0_690_visualizing_galaxy/LL0_690_visualizing_galaxy_problem',
'problem_schema': '/nfs1/dsbox-repo/data/datasets/training_datasets/LL0/LL0_690_visualizing_galaxy/LL0_690_visualizing_galaxy_problem/problemDoc.json',
'ram': '10Gi',
'saved_pipeline_ID': '',
'saving_folder_loc': '/nfs1/dsbox-repo/stan/dsbox-ta2/python/output/LL0_690_visualizing_galaxy',
'temp_storage_root': '/nfs1/dsbox-repo/stan/dsbox-ta2/python/output/LL0_690_visualizing_galaxy/temp',
'timeout': 9,
'training_data_root': '/nfs1/dsbox-repo/data/datasets/training_datasets/LL0/LL0_690_visualizing_galaxy/LL0_690_visualizing_galaxy_dataset'}
[INFO] No test data config found! Will split the data.
[INFO] - dsbox.controller.controller - Top level output directory: /nfs1/dsbox-repo/stan/dsbox-ta2/python/output/LL0_690_visualizing_galaxy
[INFO] Succesfully parsed test data
{'structural_type': <class 'd3m.container.pandas.DataFrame'>, 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Table', 'https://metadata.datadrivendiscovery.org/types/DatasetEntryPoint'), 'dimension': {'name': 'rows', 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/TabularRow',), 'length': 223}}
{'dimension': <FrozenOrderedDict OrderedDict([('name', 'rows'), ('semantic_types', ('https://metadata.datadrivendiscovery.org/types/TabularRow',)), ('length', 223)])>,
'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Table',
'https://metadata.datadrivendiscovery.org/types/DatasetEntryPoint'),
'structural_type': <class 'd3m.container.pandas.DataFrame'>}
{'structural_type': <class 'd3m.container.pandas.DataFrame'>, 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Table', 'https://metadata.datadrivendiscovery.org/types/DatasetEntryPoint'), 'dimension': {'name': 'rows', 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/TabularRow',), 'length': 100}}
{'dimension': <FrozenOrderedDict OrderedDict([('name', 'rows'), ('semantic_types', ('https://metadata.datadrivendiscovery.org/types/TabularRow',)), ('length', 100)])>,
'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Table',
'https://metadata.datadrivendiscovery.org/types/DatasetEntryPoint'),
'structural_type': <class 'd3m.container.pandas.DataFrame'>}
[INFO] Template choices:
Template ' Default_regression_template ' has been added to template base.
[INFO] Template 0:Default_regression_template Selected. UCT:[100.0]
[INFO] Worker started, id: <_MainProcess(MainProcess, started)>
/nfs1/dsbox-repo/stan/miniconda/envs/dsbox-devel-710/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
Using TensorFlow backend.
[INFO] Push@cache: ('d3m.primitives.dsbox.Denormalize', 1691920072713186883)
/nfs1/dsbox-repo/stan/miniconda/envs/dsbox-devel-710/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
Using TensorFlow backend.
[INFO] Push@cache: ('d3m.primitives.datasets.DatasetToDataFrame', 1691920072713186883)
[INFO] Push@cache: ('d3m.primitives.data.ExtractColumnsBySemanticTypes', 4720874274637968185)
[INFO] Push@cache: ('d3m.primitives.data.ExtractColumnsBySemanticTypes', -4444265286283118903)
[INFO] Push@cache: ('d3m.primitives.dsbox.Profiler', 7282301522344053085)
/nfs1/dsbox-repo/stan/dsbox-profiling/dsbox/datapreprocessing/profiler/dependencies/date_extractor.py:408: UserWarning: DateExtractor: Failed to set timezone as America/Los_Angeles. Catch offset must be a timedelta representing a whole number of minutes, not datetime.timedelta(-1, 58022).
warn('DateExtractor: Failed to set timezone as ' + str(self.default_tz) + '. Catch ' + str(e))
Traceback (most recent call last):
File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/template/search.py", line 420, in evaluate_pipeline
evaluation_result = self._evaluate(configuration, cache)
File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/template/search.py", line 439, in _evaluate
fitted_pipeline.fit(cache=cache, inputs=[self.train_dataset])
File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/pipeline/fitted_pipeline.py", line 92, in fit
self.runtime.fit(**arguments)
File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/template/runtime.py", line 193, in fit
primitive_arguments
File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/template/runtime.py", line 281, in _primitive_step_fit
produce_result = model.produce(**produce_params)
File "/nfs1/dsbox-repo/stan/dsbox-profiling/dsbox/datapreprocessing/profiler/data_profile.py", line 175, in produce
cols = self._DateFeaturizer.detect_date_columns(self._sample_df)
File "/nfs1/dsbox-repo/stan/dsbox-profiling/dsbox/datapreprocessing/profiler/date_featurizer_org.py", line 99, in detect_date_columns
if self._parse_column(sampled_df, idx) is not None:
File "/nfs1/dsbox-repo/stan/dsbox-profiling/dsbox/datapreprocessing/profiler/date_featurizer_org.py", line 302, in _parse_column
warn("Warning: multiple dates detected in column: " + idx)
TypeError: must be str, not int
Traceback (most recent call last):
File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/template/search.py", line 310, in setup_initial_candidate
candidate.data.update(result)
TypeError: 'NoneType' object is not iterable
[ERROR] Initial Pipeline failed, Trying a random pipeline ...
{'clean_step': {'hyperparameters': {},
'primitive': 'd3m.primitives.dsbox.CleaningFeaturizer'},
'corex_step': {'hyperparameters': {},
'primitive': 'd3m.primitives.dsbox.CorexText'},
'denormalize_step': {'hyperparameters': {},
'primitive': 'd3m.primitives.dsbox.Denormalize'},
'encoder_step': {'hyperparameters': {},
'primitive': 'd3m.primitives.dsbox.Encoder'},
'extract_attribute_step': {'hyperparameters': {'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Attribute',)},
'primitive': 'd3m.primitives.data.ExtractColumnsBySemanticTypes'},
'extract_target_step': {'hyperparameters': {'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Target',
'https://metadata.datadrivendiscovery.org/types/SuggestedTarget')},
'primitive': 'd3m.primitives.data.ExtractColumnsBySemanticTypes'},
'impute_step': {'hyperparameters': {},
'primitive': 'd3m.primitives.sklearn_wrap.SKImputer'},
'model_step': {'hyperparameters': {},
'primitive': 'd3m.primitives.sklearn_wrap.SKRidge'},
'profiler_step': {'hyperparameters': {},
'primitive': 'd3m.primitives.dsbox.Profiler'},
'to_dataframe_step': {'hyperparameters': {},
'primitive': 'd3m.primitives.datasets.DatasetToDataFrame'}}
--------------------
[INFO] Worker started, id: <_MainProcess(MainProcess, started)>
[INFO] Hit@cache: ('d3m.primitives.dsbox.Denormalize', 1691920072713186883)
[INFO] Hit@cache: ('d3m.primitives.datasets.DatasetToDataFrame', 1691920072713186883)
[INFO] Hit@cache: ('d3m.primitives.data.ExtractColumnsBySemanticTypes', 4720874274637968185)
[INFO] Hit@cache: ('d3m.primitives.data.ExtractColumnsBySemanticTypes', -4444265286283118903)
[INFO] Push@cache: ('d3m.primitives.dsbox.Profiler', 7282301522344053085)
Traceback (most recent call last):
File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/template/search.py", line 420, in evaluate_pipeline
evaluation_result = self._evaluate(configuration, cache)
File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/template/search.py", line 439, in _evaluate
fitted_pipeline.fit(cache=cache, inputs=[self.train_dataset])
File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/pipeline/fitted_pipeline.py", line 92, in fit
self.runtime.fit(**arguments)
File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/template/runtime.py", line 193, in fit
primitive_arguments
File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/template/runtime.py", line 281, in _primitive_step_fit
produce_result = model.produce(**produce_params)
File "/nfs1/dsbox-repo/stan/dsbox-profiling/dsbox/datapreprocessing/profiler/data_profile.py", line 175, in produce
cols = self._DateFeaturizer.detect_date_columns(self._sample_df)
File "/nfs1/dsbox-repo/stan/dsbox-profiling/dsbox/datapreprocessing/profiler/date_featurizer_org.py", line 99, in detect_date_columns
if self._parse_column(sampled_df, idx) is not None:
File "/nfs1/dsbox-repo/stan/dsbox-profiling/dsbox/datapreprocessing/profiler/date_featurizer_org.py", line 302, in _parse_column
warn("Warning: multiple dates detected in column: " + idx)
TypeError: must be str, not int
Traceback (most recent call last):
File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/template/search.py", line 310, in setup_initial_candidate
candidate.data.update(result)
TypeError: 'NoneType' object is not iterable
[ERROR] Initial Pipeline failed, Trying a random pipeline ...
{'clean_step': {'hyperparameters': {},
'primitive': 'd3m.primitives.dsbox.CleaningFeaturizer'},
'corex_step': {'hyperparameters': {},
'primitive': 'd3m.primitives.dsbox.CorexText'},
'denormalize_step': {'hyperparameters': {},
'primitive': 'd3m.primitives.dsbox.Denormalize'},
'encoder_step': {'hyperparameters': {},
'primitive': 'd3m.primitives.dsbox.Encoder'},
'extract_attribute_step': {'hyperparameters': {'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Attribute',)},
'primitive': 'd3m.primitives.data.ExtractColumnsBySemanticTypes'},
'extract_target_step': {'hyperparameters': {'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Target',
'https://metadata.datadrivendiscovery.org/types/SuggestedTarget')},
'primitive': 'd3m.primitives.data.ExtractColumnsBySemanticTypes'},
'impute_step': {'hyperparameters': {},
'primitive': 'd3m.primitives.sklearn_wrap.SKImputer'},
'model_step': {'hyperparameters': {},
'primitive': 'd3m.primitives.sklearn_wrap.SKRidge'},
'profiler_step': {'hyperparameters': {},
'primitive': 'd3m.primitives.dsbox.Profiler'},
'to_dataframe_step': {'hyperparameters': {},
'primitive': 'd3m.primitives.datasets.DatasetToDataFrame'}}
--------------------
Traceback (most recent call last):
File "ta2-search", line 141, in <module>
result = main(args)
File "ta2-search", line 110, in main
status = controller.train()
File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/controller/controller.py", line 535, in train
template, candidate=self.exec_history.iloc[idx]['candidate'], cache=cache)
File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/controller/controller.py", line 371, in search_template
candidate_in=candidate, cache=cache)
File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/template/search.py", line 145, in search_one_iter
self.setup_initial_candidate(candidate_in, cache)
File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/template/search.py", line 319, in setup_initial_candidate
raise ValueError("Invalid initial candidate")
ValueError: Invalid initial candidate
Running
python ta2-search /nas/home/stan/dsbox/runs2/config-ll0/LL0_690_visualizing_galaxy_config.json
breaks the profiler. The chosen template is default_regression_template.