Closed serbanstan closed 6 years ago
Adding PCA in our pipeline, with a set number of components makes our cast_to_type primitive break.
(dsbox-devel-710) [stan@dsbox01 python]$ python ta2-search /nas/home/stan/dsbox/runs2/config-ll0/LL0_uci_facebook_metrics_config.json Namespace(configuration_file='/nas/home/stan/dsbox/runs2/config-ll0/LL0_uci_facebook_metrics_config.json', cpus=-1, debug=False, output_prefix=None, timeout=-1) Using configuation: {'cpus': '10', 'dataset_schema': '/nfs1/dsbox-repo/data/datasets-v31/training_datasets/LL0/LL0_uci_facebook_metrics/LL0_uci_facebook_metrics_dataset/datasetDoc.json', 'executables_root': '/nas/home/stan/dsbox/runs2/output-ll0/LL0_uci_facebook_metrics/executables', 'pipeline_logs_root': '/nas/home/stan/dsbox/runs2/output-ll0/LL0_uci_facebook_metrics/logs', 'problem_root': '/nfs1/dsbox-repo/data/datasets-v31/training_datasets/LL0/LL0_uci_facebook_metrics/LL0_uci_facebook_metrics_problem', 'problem_schema': '/nfs1/dsbox-repo/data/datasets-v31/training_datasets/LL0/LL0_uci_facebook_metrics/LL0_uci_facebook_metrics_problem/problemDoc.json', 'ram': '10Gi', 'temp_storage_root': '/nas/home/stan/dsbox/runs2/output-ll0/LL0_uci_facebook_metrics/temp', 'timeout': 19, 'training_data_root': '/nfs1/dsbox-repo/data/datasets-v31/training_datasets/LL0/LL0_uci_facebook_metrics/LL0_uci_facebook_metrics_dataset'} [INFO] No test data config found! Will split the data. [INFO] - dsbox.controller.controller - Top level output directory: /nas/home/stan/dsbox/runs2/output-ll0/LL0_uci_facebook_metrics [INFO] Template choices: Template ' SRI_Mean_Baseline_Template ' has been added to template base. Template ' default_regression_template ' has been added to template base. Template ' default_text_regression_template ' has been added to template base. Template ' UU3_Test_Template ' has been added to template base. Template ' Default_timeseries_regression_template ' has been added to template base. Template ' regression_with_feature_selection ' has been added to template base. Template ' dsbox_regression_template ' has been added to template base. [INFO] - dsbox.controller.controller - [INFO] Template 0:SRI_Mean_Baseline_Template Selected. UCT:[None, None, None, None, None, None, None] [INFO] - dsbox.controller.controller - Searching template SRI_Mean_Baseline_Template [INFO] - dsbox.controller.controller - cache size = 0 [INFO] Using Global Cache [INFO] Worker started, id: <_MainProcess(MainProcess, started)> , True /nfs1/dsbox-repo/stan/miniconda/envs/dsbox-devel-710/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`. from ._conv import register_converters as _register_converters Using TensorFlow backend. [INFO] Will use normal train-test mode ( n = 1 ) to choose best primitives. shape N/A [INFO] Push@cache: ('d3m.primitives.sri.baseline.MeanBaseline', 523735248979574436) [INFO] Testing finish.!!! [INFO] Now in normal mode, will add extra train with train_dataset1 shape N/A [INFO] Push@cache: ('d3m.primitives.sri.baseline.MeanBaseline', -8667387790502098007) [INFO] Now are training the pipeline with all dataset and saving the pipeline. shape N/A [INFO] Push@cache: ('d3m.primitives.sri.baseline.MeanBaseline', -6906265235819350) !!!!!! TEST_DATASET1 {'cross_validation_metrics': [], 'fitted_pipeline': <dsbox.pipeline.fitted_pipeline.FittedPipeline object at 0x7f305a7b0390>, 'test_metrics': [{'column_name': 'Page_total_likes_target', 'metric': 'meanSquaredError', 'value': 324835865.8209}], 'total_runtime': 29.177137851715088, 'training_metrics': [{'column_name': 'Page_total_likes_target', 'metric': 'meanSquaredError', 'value': 248196403.78777778}]} !!!! [INFO] push@Candidate: (-2274051079072489220,f81dabd3-b61a-4687-a121-2d4546d2139b) [INFO] - dsbox.controller.controller - ****************** [INFO] Writing results {'cross_validation_metrics': [], 'fitted_pipeline': <dsbox.pipeline.fitted_pipeline.FittedPipeline object at 0x7f305a7b0390>, 'test_metrics': [{'column_name': 'Page_total_likes_target', 'metric': 'meanSquaredError', 'value': 324835865.8209}], 'total_runtime': 29.177137851715088, 'training_metrics': [{'column_name': 'Page_total_likes_target', 'metric': 'meanSquaredError', 'value': 248196403.78777778}]} [INFO] - dsbox.controller.controller - {'fitted_pipeline': <dsbox.pipeline.fitted_pipeline.FittedPipeline object at 0x7f305a7b0390>, 'training_metrics': [{'column_name': 'Page_total_likes_target', 'metric': 'meanSquaredError', 'value': 248196403.78777778}], 'cross_validation_metrics': [], 'test_metrics': [{'column_name': 'Page_total_likes_target', 'metric': 'meanSquaredError', 'value': 324835865.8209}], 'total_runtime': 29.177137851715088} 324835865.8209 [INFO] - dsbox.controller.controller - Training meanSquaredError = 248196403.78777778 [INFO] - dsbox.controller.controller - Validation meanSquaredError = 324835865.8209 [INFO] - dsbox.controller.controller - [INFO] report: 324835865.8209 [INFO] - dsbox.controller.controller - [INFO] UCT updated: [10.348094163295801, 111.17835653996396, 111.17835653996396, 111.17835653996396, 111.17835653996396, 111.17835653996396, 111.17835653996396] [INFO] - dsbox.controller.controller - [INFO] cache size: 3, candidates: 1 [INFO] - dsbox.controller.controller - [INFO] New Best Value: 324835865.8209 [INFO] - dsbox.controller.controller - ****************** [INFO] Saving training results in /nas/home/stan/dsbox/runs2/output-ll0/LL0_uci_facebook_metrics.txt [INFO] - dsbox.controller.controller - [INFO] Template 1:default_regression_template Selected. UCT:[10.348094163295801, 111.17835653996396, 111.17835653996396, 111.17835653996396, 111.17835653996396, 111.17835653996396, 111.17835653996396] [INFO] - dsbox.controller.controller - Searching template default_regression_template [INFO] - dsbox.controller.controller - cache size = 3 [INFO] Using Global Cache [INFO] Worker started, id: <_MainProcess(MainProcess, started)> , True [INFO] Will use cross validation( n = 10 ) to choose best primitives. shape N/A [INFO] Push@cache: ('d3m.primitives.dsbox.Denormalize', -8667387790502098007) /nfs1/dsbox-repo/stan/miniconda/envs/dsbox-devel-710/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`. from ._conv import register_converters as _register_converters Using TensorFlow backend. shape N/A [INFO] Push@cache: ('d3m.primitives.datasets.DatasetToDataFrame', -6784794392826866445) (400, 20) [INFO] Push@cache: ('d3m.primitives.data.ExtractColumnsBySemanticTypes', -5361314593299247151) (400, 20) [INFO] Push@cache: ('d3m.primitives.data.ExtractColumnsBySemanticTypes', -8917049308239181810) (400, 17) [INFO] Push@cache: ('d3m.primitives.dsbox.Profiler', -1730383627972381479) (400, 17) [INFO] Push@cache: ('d3m.primitives.dsbox.CleaningFeaturizer', 5634579432155763573) (400, 17) [INFO] Push@cache: ('d3m.primitives.dsbox.CorexText', 2857920993194579986) (400, 17) [INFO] Push@cache: ('d3m.primitives.dsbox.Encoder', 7949591534899160329) (400, 45) [INFO] Push@cache: ('d3m.primitives.dsbox.MeanImputation', 4704891850192963082) (400, 45) [INFO] Push@cache: ('d3m.primitives.sklearn_wrap.SKMaxAbsScaler', -5540225622402709454) (400, 45) [INFO] Push@cache: ('d3m.primitives.sklearn_wrap.SKPCA', -8561610737237944930) (400, 5) [INFO] Push@cache: ('d3m.primitives.data.CastToType', -1444505970168401487) Traceback (most recent call last): File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/template/search.py", line 552, in evaluate_pipeline evaluation_result = self._evaluate(configuration, cache, dump2disk) File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/template/search.py", line 575, in _evaluate fitted_pipeline.fit(cache=cache, inputs=[self.train_dataset1]) File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/pipeline/fitted_pipeline.py", line 94, in fit self.runtime.fit(**arguments) File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/template/runtime.py", line 210, in fit primitive_arguments File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/template/runtime.py", line 304, in _primitive_step_fit produce_result = model.produce(**produce_params) File "/nfs1/dsbox-repo/stan/common-primitives/common_primitives/cast_to_type.py", line 80, in produce outputs = inputs.iloc[:, columns_to_use].astype(type_to_cast) File "/nfs1/dsbox-repo/stan/miniconda/envs/dsbox-devel-710/lib/python3.6/site-packages/pandas/core/indexing.py", line 1367, in __getitem__ return self._getitem_tuple(key) File "/nfs1/dsbox-repo/stan/miniconda/envs/dsbox-devel-710/lib/python3.6/site-packages/pandas/core/indexing.py", line 1737, in _getitem_tuple self._has_valid_tuple(tup) File "/nfs1/dsbox-repo/stan/miniconda/envs/dsbox-devel-710/lib/python3.6/site-packages/pandas/core/indexing.py", line 204, in _has_valid_tuple if not self._has_valid_type(k, i): File "/nfs1/dsbox-repo/stan/miniconda/envs/dsbox-devel-710/lib/python3.6/site-packages/pandas/core/indexing.py", line 1674, in _has_valid_type return self._is_valid_list_like(key, axis) File "/nfs1/dsbox-repo/stan/miniconda/envs/dsbox-devel-710/lib/python3.6/site-packages/pandas/core/indexing.py", line 1731, in _is_valid_list_like raise IndexError("positional indexers are out-of-bounds") IndexError: positional indexers are out-of-bounds
Seems to have been solved. This datasets is however still a problem, see https://github.com/usc-isi-i2/dsbox-ta2/issues/159 . Closing.
Adding PCA in our pipeline, with a set number of components makes our cast_to_type primitive break.