usc-isi-i2 / dsbox-ta2

The DSBox TA2 component
MIT License
11 stars 6 forks source link

Date Featerizer Failure #87

Closed proska closed 6 years ago

proska commented 6 years ago

Running command:

python ta2-search ~/dsbox/runs2/config-seed-test/LL0_690_visualizing_galaxy/search_config.json

Error message:

/nfs1/dsbox-repo/qasemi/dsbox-cleaning/dsbox/datapreprocessing/cleaner/dependencies/date_extractor.py:408: UserWarning: DateExtractor: Failed to set timezone as America/Los_Angeles. Catch offset must be a timedelta representing a whole number of minutes, not datetime.timedelta(-1, 58022).
  warn('DateExtractor: Failed to set timezone as ' + str(self.default_tz) + '. Catch ' + str(e))
Traceback (most recent call last):
  File "/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/dsbox/template/search.py", line 488, in evaluate_pipeline
    evaluation_result = self._evaluate(configuration, cache, dump2disk)
  File "/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/dsbox/template/search.py", line 508, in _evaluate
    fitted_pipeline.fit(cache=cache, inputs=[self.train_dataset])
  File "/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/dsbox/pipeline/fitted_pipeline.py", line 94, in fit
    self.runtime.fit(**arguments)
  File "/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/dsbox/template/runtime.py", line 194, in fit
    primitive_arguments
  File "/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/dsbox/template/runtime.py", line 282, in _primitive_step_fit
    produce_result = model.produce(**produce_params)
  File "/nfs1/dsbox-repo/qasemi/dsbox-cleaning/dsbox/datapreprocessing/cleaner/data_profile.py", line 169, in produce
    cols = self._DateFeaturizer.detect_date_columns(self._sample_df)
  File "/nfs1/dsbox-repo/qasemi/dsbox-cleaning/dsbox/datapreprocessing/cleaner/dependencies/date_featurizer_org.py", line 100, in detect_date_columns
    if self._parse_column(sampled_df, idx) is not None:
  File "/nfs1/dsbox-repo/qasemi/dsbox-cleaning/dsbox/datapreprocessing/cleaner/dependencies/date_featurizer_org.py", line 303, in _parse_column
    warn("Warning: multiple dates detected in column: " + idx)
TypeError: must be str, not int

Full Output:

(d3m-devel) [qasemi@dsbox02 python]$ python ta2-search ~/dsbox/runs2/config-seed-test/LL0_690_visualizing_galaxy/search_config.json 
Namespace(configuration_file='/nas/home/qasemi/dsbox/runs2/config-seed-test/LL0_690_visualizing_galaxy/search_config.json', cpus=-1, debug=False, output_prefix=None, timeout=-1)
Using configuation:
{'dataset_schema': '/nfs1/dsbox-repo/data/datasets-v31/training_datasets/LL0/LL0_690_visualizing_galaxy/LL0_690_visualizing_galaxy_dataset/datasetDoc.json',
 'executables_root': '/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/output/LL0_690_visualizing_galaxy/executables',
 'pipeline_logs_root': '/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/output/LL0_690_visualizing_galaxy/logs',
 'problem_root': '/nfs1/dsbox-repo/data/datasets-v31/training_datasets/LL0/LL0_690_visualizing_galaxy/LL0_690_visualizing_galaxy_problem',
 'problem_schema': '/nfs1/dsbox-repo/data/datasets-v31/training_datasets/LL0/LL0_690_visualizing_galaxy/LL0_690_visualizing_galaxy_problem/problemDoc.json',
 'saved_pipeline_ID': '',
 'saving_folder_loc': '/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/output/LL0_690_visualizing_galaxy',
 'temp_storage_root': '/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/output/LL0_690_visualizing_galaxy/temp',
 'timeout': 0,
 'training_data_root': '/nfs1/dsbox-repo/data/datasets-v31/training_datasets/LL0/LL0_690_visualizing_galaxy/LL0_690_visualizing_galaxy_dataset',
 'user_problems_root': '/nas/home/qasemi/dsbox/runs2/output-seed/LL0_690_visualizing_galaxy/user_problems'}
[INFO] No test data config found! Will split the data.
[INFO] - dsbox.controller.controller - Top level output directory: /nfs1/dsbox-repo/qasemi/dsbox-ta2/python/output/LL0_690_visualizing_galaxy
[INFO] Succesfully parsed test data
{'structural_type': <class 'd3m.container.pandas.DataFrame'>, 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Table', 'https://metadata.datadrivendiscovery.org/types/DatasetEntryPoint'), 'dimension': {'name': 'rows', 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/TabularRow',), 'length': 223}}
{'dimension': <FrozenOrderedDict OrderedDict([('name', 'rows'), ('semantic_types', ('https://metadata.datadrivendiscovery.org/types/TabularRow',)), ('length', 223)])>,
 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Table',
                    'https://metadata.datadrivendiscovery.org/types/DatasetEntryPoint'),
 'structural_type': <class 'd3m.container.pandas.DataFrame'>}
{'structural_type': <class 'd3m.container.pandas.DataFrame'>, 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Table', 'https://metadata.datadrivendiscovery.org/types/DatasetEntryPoint'), 'dimension': {'name': 'rows', 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/TabularRow',), 'length': 100}}
{'dimension': <FrozenOrderedDict OrderedDict([('name', 'rows'), ('semantic_types', ('https://metadata.datadrivendiscovery.org/types/TabularRow',)), ('length', 100)])>,
 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Table',
                    'https://metadata.datadrivendiscovery.org/types/DatasetEntryPoint'),
 'structural_type': <class 'd3m.container.pandas.DataFrame'>}
[INFO] Template choices:
Template ' Default_regression_template ' has been added to template base.
[INFO] Template 0:Default_regression_template Selected. UCT:[100.0]
[INFO] Worker started, id: <_MainProcess(MainProcess, started)> , True
/nfs1/dsbox-repo/qasemi/miniconda/envs/d3m-devel/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
[INFO] Push@cache: ('d3m.primitives.dsbox.Denormalize', 6809185080492433979)
/nfs1/dsbox-repo/qasemi/miniconda/envs/d3m-devel/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
[INFO] Push@cache: ('d3m.primitives.datasets.DatasetToDataFrame', 6809185080492433979)
[INFO] Push@cache: ('d3m.primitives.data.ExtractColumnsBySemanticTypes', -3421672811617791271)
[INFO] Push@cache: ('d3m.primitives.data.ExtractColumnsBySemanticTypes', -5556157206724971364)
[INFO] Push@cache: ('d3m.primitives.dsbox.Profiler', -6948590468534388086)
/nfs1/dsbox-repo/qasemi/dsbox-cleaning/dsbox/datapreprocessing/cleaner/dependencies/date_extractor.py:408: UserWarning: DateExtractor: Failed to set timezone as America/Los_Angeles. Catch offset must be a timedelta representing a whole number of minutes, not datetime.timedelta(-1, 58022).
  warn('DateExtractor: Failed to set timezone as ' + str(self.default_tz) + '. Catch ' + str(e))
Traceback (most recent call last):
  File "/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/dsbox/template/search.py", line 488, in evaluate_pipeline
    evaluation_result = self._evaluate(configuration, cache, dump2disk)
  File "/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/dsbox/template/search.py", line 508, in _evaluate
    fitted_pipeline.fit(cache=cache, inputs=[self.train_dataset])
  File "/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/dsbox/pipeline/fitted_pipeline.py", line 94, in fit
    self.runtime.fit(**arguments)
  File "/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/dsbox/template/runtime.py", line 194, in fit
    primitive_arguments
  File "/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/dsbox/template/runtime.py", line 282, in _primitive_step_fit
    produce_result = model.produce(**produce_params)
  File "/nfs1/dsbox-repo/qasemi/dsbox-cleaning/dsbox/datapreprocessing/cleaner/data_profile.py", line 169, in produce
    cols = self._DateFeaturizer.detect_date_columns(self._sample_df)
  File "/nfs1/dsbox-repo/qasemi/dsbox-cleaning/dsbox/datapreprocessing/cleaner/dependencies/date_featurizer_org.py", line 100, in detect_date_columns
    if self._parse_column(sampled_df, idx) is not None:
  File "/nfs1/dsbox-repo/qasemi/dsbox-cleaning/dsbox/datapreprocessing/cleaner/dependencies/date_featurizer_org.py", line 303, in _parse_column
    warn("Warning: multiple dates detected in column: " + idx)
TypeError: must be str, not int
[INFO] push@Candidate: (6628973257765379899,None)
Traceback (most recent call last):
  File "/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/dsbox/template/search.py", line 377, in setup_initial_candidate
    candidate.data.update(result)
TypeError: 'NoneType' object is not iterable
[ERROR] Initial Pipeline failed, Trying a random pipeline ...
{'clean_step': {'hyperparameters': {},
                'primitive': 'd3m.primitives.dsbox.CleaningFeaturizer'},
 'corex_step': {'hyperparameters': {},
                'primitive': 'd3m.primitives.dsbox.CorexText'},
 'denormalize_step': {'hyperparameters': {},
                      'primitive': 'd3m.primitives.dsbox.Denormalize'},
 'encoder_step': {'hyperparameters': {},
                  'primitive': 'd3m.primitives.dsbox.Encoder'},
 'extract_attribute_step': {'hyperparameters': {'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Attribute',)},
                            'primitive': 'd3m.primitives.data.ExtractColumnsBySemanticTypes'},
 'extract_target_step': {'hyperparameters': {'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Target',
                                                                'https://metadata.datadrivendiscovery.org/types/SuggestedTarget')},
                         'primitive': 'd3m.primitives.data.ExtractColumnsBySemanticTypes'},
 'impute_step': {'hyperparameters': {},
                 'primitive': 'd3m.primitives.sklearn_wrap.SKImputer'},
 'model_step': {'hyperparameters': {},
                'primitive': 'd3m.primitives.sklearn_wrap.SKARDRegression'},
 'profiler_step': {'hyperparameters': {},
                   'primitive': 'd3m.primitives.dsbox.Profiler'},
 'to_dataframe_step': {'hyperparameters': {},
                       'primitive': 'd3m.primitives.datasets.DatasetToDataFrame'}}
--------------------
[INFO] hit@Candidate: (6628973257765379899,None)
Traceback (most recent call last):
  File "/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/dsbox/template/search.py", line 371, in setup_initial_candidate
    raise ValueError("Candidate is not compatible with the dataset")
ValueError: Candidate is not compatible with the dataset
[ERROR] Initial Pipeline failed, Trying a random pipeline ...
{'clean_step': {'hyperparameters': {},
                'primitive': 'd3m.primitives.dsbox.CleaningFeaturizer'},
 'corex_step': {'hyperparameters': {},
                'primitive': 'd3m.primitives.dsbox.CorexText'},
 'denormalize_step': {'hyperparameters': {},
                      'primitive': 'd3m.primitives.dsbox.Denormalize'},
 'encoder_step': {'hyperparameters': {},
                  'primitive': 'd3m.primitives.dsbox.Encoder'},
 'extract_attribute_step': {'hyperparameters': {'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Attribute',)},
                            'primitive': 'd3m.primitives.data.ExtractColumnsBySemanticTypes'},
 'extract_target_step': {'hyperparameters': {'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Target',
                                                                'https://metadata.datadrivendiscovery.org/types/SuggestedTarget')},
                         'primitive': 'd3m.primitives.data.ExtractColumnsBySemanticTypes'},
 'impute_step': {'hyperparameters': {},
                 'primitive': 'd3m.primitives.sklearn_wrap.SKImputer'},
 'model_step': {'hyperparameters': {},
                'primitive': 'd3m.primitives.sklearn_wrap.SKARDRegression'},
 'profiler_step': {'hyperparameters': {},
                   'primitive': 'd3m.primitives.dsbox.Profiler'},
 'to_dataframe_step': {'hyperparameters': {},
                       'primitive': 'd3m.primitives.datasets.DatasetToDataFrame'}}
--------------------
Traceback (most recent call last):
  File "/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/dsbox/controller/controller.py", line 528, in train
    cache_bundle=(cache, candidate_cache),
  File "/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/dsbox/controller/controller.py", line 375, in search_template
    report = search.search_one_iter(candidate_in=candidate, cache_bundle=cache_bundle)
  File "/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/dsbox/template/search.py", line 191, in search_one_iter
    self.setup_initial_candidate(candidate_in, cache, candidate_cache)
  File "/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/dsbox/template/search.py", line 386, in setup_initial_candidate
    raise ValueError("Invalid initial candidate")
ValueError: Invalid initial candidate
szeke commented 6 years ago

@proska can you please try to fix it? Tanay wrote the code originally, but he is on vacation until next Tuesday

RqS commented 6 years ago

Probably pulling latest will work. I have already fix it I think

On Thu, Jul 19, 2018 at 11:31 AM Ehsan Qasemi notifications@github.com wrote:

Running command:

python ta2-search ~/dsbox/runs2/config-seed-test/LL0_690_visualizing_galaxy/search_config.json

Error message:

/nfs1/dsbox-repo/qasemi/dsbox-cleaning/dsbox/datapreprocessing/cleaner/dependencies/date_extractor.py:408: UserWarning: DateExtractor: Failed to set timezone as America/Los_Angeles. Catch offset must be a timedelta representing a whole number of minutes, not datetime.timedelta(-1, 58022). warn('DateExtractor: Failed to set timezone as ' + str(self.default_tz) + '. Catch ' + str(e)) Traceback (most recent call last): File "/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/dsbox/template/search.py", line 488, in evaluate_pipeline evaluation_result = self._evaluate(configuration, cache, dump2disk) File "/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/dsbox/template/search.py", line 508, in _evaluate fitted_pipeline.fit(cache=cache, inputs=[self.train_dataset]) File "/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/dsbox/pipeline/fitted_pipeline.py", line 94, in fit self.runtime.fit(arguments) File "/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/dsbox/template/runtime.py", line 194, in fit primitive_arguments File "/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/dsbox/template/runtime.py", line 282, in _primitive_step_fit produce_result = model.produce(produce_params) File "/nfs1/dsbox-repo/qasemi/dsbox-cleaning/dsbox/datapreprocessing/cleaner/data_profile.py", line 169, in produce cols = self._DateFeaturizer.detect_date_columns(self._sample_df) File "/nfs1/dsbox-repo/qasemi/dsbox-cleaning/dsbox/datapreprocessing/cleaner/dependencies/date_featurizer_org.py", line 100, in detect_date_columns if self._parse_column(sampled_df, idx) is not None: File "/nfs1/dsbox-repo/qasemi/dsbox-cleaning/dsbox/datapreprocessing/cleaner/dependencies/date_featurizer_org.py", line 303, in _parse_column warn("Warning: multiple dates detected in column: " + idx) TypeError: must be str, not int

Full Output:

(d3m-devel) [qasemi@dsbox02 python]$ python ta2-search ~/dsbox/runs2/config-seed-test/LL0_690_visualizing_galaxy/search_config.json Namespace(configuration_file='/nas/home/qasemi/dsbox/runs2/config-seed-test/LL0_690_visualizing_galaxy/search_config.json', cpus=-1, debug=False, output_prefix=None, timeout=-1) Using configuation: {'dataset_schema': '/nfs1/dsbox-repo/data/datasets-v31/training_datasets/LL0/LL0_690_visualizing_galaxy/LL0_690_visualizing_galaxy_dataset/datasetDoc.json', 'executables_root': '/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/output/LL0_690_visualizing_galaxy/executables', 'pipeline_logs_root': '/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/output/LL0_690_visualizing_galaxy/logs', 'problem_root': '/nfs1/dsbox-repo/data/datasets-v31/training_datasets/LL0/LL0_690_visualizing_galaxy/LL0_690_visualizing_galaxy_problem', 'problem_schema': '/nfs1/dsbox-repo/data/datasets-v31/training_datasets/LL0/LL0_690_visualizing_galaxy/LL0_690_visualizing_galaxy_problem/problemDoc.json', 'saved_pipeline_ID': '', 'saving_folder_loc': '/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/output/LL0_690_visualizing_galaxy', 'temp_storage_root': '/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/output/LL0_690_visualizing_galaxy/temp', 'timeout': 0, 'training_data_root': '/nfs1/dsbox-repo/data/datasets-v31/training_datasets/LL0/LL0_690_visualizing_galaxy/LL0_690_visualizing_galaxy_dataset', 'user_problems_root': '/nas/home/qasemi/dsbox/runs2/output-seed/LL0_690_visualizing_galaxy/user_problems'} [INFO] No test data config found! Will split the data. [INFO] - dsbox.controller.controller - Top level output directory: /nfs1/dsbox-repo/qasemi/dsbox-ta2/python/output/LL0_690_visualizing_galaxy [INFO] Succesfully parsed test data {'structural_type': <class 'd3m.container.pandas.DataFrame'>, 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Table', 'https://metadata.datadrivendiscovery.org/types/DatasetEntryPoint'), 'dimension': {'name': 'rows', 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/TabularRow',), 'length': 223}} {'dimension': <FrozenOrderedDict OrderedDict([('name', 'rows'), ('semantic_types', ('https://metadata.datadrivendiscovery.org/types/TabularRow',)), ('length', 223)])>, 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Table', 'https://metadata.datadrivendiscovery.org/types/DatasetEntryPoint'), 'structural_type': <class 'd3m.container.pandas.DataFrame'>} {'structural_type': <class 'd3m.container.pandas.DataFrame'>, 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Table', 'https://metadata.datadrivendiscovery.org/types/DatasetEntryPoint'), 'dimension': {'name': 'rows', 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/TabularRow',), 'length': 100}} {'dimension': <FrozenOrderedDict OrderedDict([('name', 'rows'), ('semantic_types', ('https://metadata.datadrivendiscovery.org/types/TabularRow',)), ('length', 100)])>, 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Table', 'https://metadata.datadrivendiscovery.org/types/DatasetEntryPoint'), 'structural_type': <class 'd3m.container.pandas.DataFrame'>} [INFO] Template choices: Template ' Default_regression_template ' has been added to template base. [INFO] Template 0:Default_regression_template Selected. UCT:[100.0] [INFO] Worker started, id: <_MainProcess(MainProcess, started)> , True /nfs1/dsbox-repo/qasemi/miniconda/envs/d3m-devel/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters Using TensorFlow backend. [INFO] Push@cache: ('d3m.primitives.dsbox.Denormalize', 6809185080492433979) /nfs1/dsbox-repo/qasemi/miniconda/envs/d3m-devel/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters Using TensorFlow backend. [INFO] Push@cache: ('d3m.primitives.datasets.DatasetToDataFrame', 6809185080492433979) [INFO] Push@cache: ('d3m.primitives.data.ExtractColumnsBySemanticTypes', -3421672811617791271) [INFO] Push@cache: ('d3m.primitives.data.ExtractColumnsBySemanticTypes', -5556157206724971364) [INFO] Push@cache: ('d3m.primitives.dsbox.Profiler', -6948590468534388086) /nfs1/dsbox-repo/qasemi/dsbox-cleaning/dsbox/datapreprocessing/cleaner/dependencies/date_extractor.py:408: UserWarning: DateExtractor: Failed to set timezone as America/Los_Angeles. Catch offset must be a timedelta representing a whole number of minutes, not datetime.timedelta(-1, 58022). warn('DateExtractor: Failed to set timezone as ' + str(self.default_tz) + '. Catch ' + str(e)) Traceback (most recent call last): File "/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/dsbox/template/search.py", line 488, in evaluate_pipeline evaluation_result = self._evaluate(configuration, cache, dump2disk) File "/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/dsbox/template/search.py", line 508, in _evaluate fitted_pipeline.fit(cache=cache, inputs=[self.train_dataset]) File "/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/dsbox/pipeline/fitted_pipeline.py", line 94, in fit self.runtime.fit(arguments) File "/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/dsbox/template/runtime.py", line 194, in fit primitive_arguments File "/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/dsbox/template/runtime.py", line 282, in _primitive_step_fit produce_result = model.produce(produce_params) File "/nfs1/dsbox-repo/qasemi/dsbox-cleaning/dsbox/datapreprocessing/cleaner/data_profile.py", line 169, in produce cols = self._DateFeaturizer.detect_date_columns(self._sample_df) File "/nfs1/dsbox-repo/qasemi/dsbox-cleaning/dsbox/datapreprocessing/cleaner/dependencies/date_featurizer_org.py", line 100, in detect_date_columns if self._parse_column(sampled_df, idx) is not None: File "/nfs1/dsbox-repo/qasemi/dsbox-cleaning/dsbox/datapreprocessing/cleaner/dependencies/date_featurizer_org.py", line 303, in _parse_column warn("Warning: multiple dates detected in column: " + idx) TypeError: must be str, not int [INFO] push@Candidate: (6628973257765379899,None) Traceback (most recent call last): File "/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/dsbox/template/search.py", line 377, in setup_initial_candidate candidate.data.update(result) TypeError: 'NoneType' object is not iterable [ERROR] Initial Pipeline failed, Trying a random pipeline ... {'clean_step': {'hyperparameters': {}, 'primitive': 'd3m.primitives.dsbox.CleaningFeaturizer'}, 'corex_step': {'hyperparameters': {}, 'primitive': 'd3m.primitives.dsbox.CorexText'}, 'denormalize_step': {'hyperparameters': {}, 'primitive': 'd3m.primitives.dsbox.Denormalize'}, 'encoder_step': {'hyperparameters': {}, 'primitive': 'd3m.primitives.dsbox.Encoder'}, 'extract_attribute_step': {'hyperparameters': {'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Attribute',)}, 'primitive': 'd3m.primitives.data.ExtractColumnsBySemanticTypes'}, 'extract_target_step': {'hyperparameters': {'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Target', 'https://metadata.datadrivendiscovery.org/types/SuggestedTarget')}, 'primitive': 'd3m.primitives.data.ExtractColumnsBySemanticTypes'}, 'impute_step': {'hyperparameters': {}, 'primitive': 'd3m.primitives.sklearn_wrap.SKImputer'}, 'model_step': {'hyperparameters': {}, 'primitive': 'd3m.primitives.sklearn_wrap.SKARDRegression'}, 'profiler_step': {'hyperparameters': {}, 'primitive': 'd3m.primitives.dsbox.Profiler'}, 'to_dataframe_step': {'hyperparameters': {}, 'primitive': 'd3m.primitives.datasets.DatasetToDataFrame'}}

[INFO] hit@Candidate: (6628973257765379899,None) Traceback (most recent call last): File "/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/dsbox/template/search.py", line 371, in setup_initial_candidate raise ValueError("Candidate is not compatible with the dataset") ValueError: Candidate is not compatible with the dataset [ERROR] Initial Pipeline failed, Trying a random pipeline ... {'clean_step': {'hyperparameters': {}, 'primitive': 'd3m.primitives.dsbox.CleaningFeaturizer'}, 'corex_step': {'hyperparameters': {}, 'primitive': 'd3m.primitives.dsbox.CorexText'}, 'denormalize_step': {'hyperparameters': {}, 'primitive': 'd3m.primitives.dsbox.Denormalize'}, 'encoder_step': {'hyperparameters': {}, 'primitive': 'd3m.primitives.dsbox.Encoder'}, 'extract_attribute_step': {'hyperparameters': {'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Attribute',)}, 'primitive': 'd3m.primitives.data.ExtractColumnsBySemanticTypes'}, 'extract_target_step': {'hyperparameters': {'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Target', 'https://metadata.datadrivendiscovery.org/types/SuggestedTarget')}, 'primitive': 'd3m.primitives.data.ExtractColumnsBySemanticTypes'}, 'impute_step': {'hyperparameters': {}, 'primitive': 'd3m.primitives.sklearn_wrap.SKImputer'}, 'model_step': {'hyperparameters': {}, 'primitive': 'd3m.primitives.sklearn_wrap.SKARDRegression'}, 'profiler_step': {'hyperparameters': {}, 'primitive': 'd3m.primitives.dsbox.Profiler'}, 'to_dataframe_step': {'hyperparameters': {}, 'primitive': 'd3m.primitives.datasets.DatasetToDataFrame'}}

Traceback (most recent call last): File "/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/dsbox/controller/controller.py", line 528, in train cache_bundle=(cache, candidate_cache), File "/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/dsbox/controller/controller.py", line 375, in search_template report = search.search_one_iter(candidate_in=candidate, cache_bundle=cache_bundle) File "/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/dsbox/template/search.py", line 191, in search_one_iter self.setup_initial_candidate(candidate_in, cache, candidate_cache) File "/nfs1/dsbox-repo/qasemi/dsbox-ta2/python/dsbox/template/search.py", line 386, in setup_initial_candidate raise ValueError("Invalid initial candidate") ValueError: Invalid initial candidate

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/usc-isi-i2/dsbox-ta2/issues/87, or mute the thread https://github.com/notifications/unsubscribe-auth/AGPpwa3jKvrCc7MMGOUimkyfiStKXF_uks5uINBigaJpZM4VW4o- .

RqS commented 6 years ago

Just update cleaning Now it should be fixed

proska commented 6 years ago

fixed in d95d73494628e842ebbd2e368faa5f0cddf648e9