It will be super helpful to let ParallelRunStep class to allow dataframes as inputs.
I understand that ParallelRunStep class only allows the input types - [DatasetConsumptionConfig, PipelineOutputTabularDataset,PipelineOutputTabularDataset, OutputFileDatasetConfig, OutputTabularDatasetConfig, LinkFileOutputDatasetConfig, LinkTabularOutputDatasetConfig]
Is it possible to let dataframes as inputs in ParallelRunStep. Could this be a usecase that Azure ML dev team would consider?
Exception Traceback (most recent call last)
<ipython-input-27-215e373515cb> in <module>
7 output=output_dir,
8 allow_reuse=False,
----> 9 arguments=None
10 )
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/steps/parallel_run_step.py in __init__(self, name, parallel_run_config, inputs, output, side_inputs, arguments, allow_reuse)
155 side_inputs=side_inputs,
156 arguments=arguments,
--> 157 allow_reuse=allow_reuse,
158 )
159
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/core/_parallel_run_step_base.py in __init__(self, name, parallel_run_config, inputs, output, side_inputs, arguments, allow_reuse)
259
260 self._process_inputs_output_dataset_configs()
--> 261 self._validate()
262 self._get_pystep_inputs()
263
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/core/_parallel_run_step_base.py in _validate(self)
329 """Validate input params to init parallel run step class."""
330 self._validate_arguments()
--> 331 self._validate_inputs()
332 self._validate_output()
333 self._validate_parallel_run_config()
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/core/_parallel_run_step_base.py in _validate_inputs(self)
410
411 if self._inputs:
--> 412 self._input_ds_type = self._get_input_type(self._inputs[0])
413 for input_ds in self._inputs:
414 if self._input_ds_type != self._get_input_type(input_ds):
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/core/_parallel_run_step_base.py in _get_input_type(self, in_ds)
399 ds_mapping_type = INPUT_TYPE_DICT[input_type]
400 else:
--> 401 raise Exception("Step input must be of any type: {}, found {}".format(ALLOWED_INPUT_TYPES, input_type))
402 return ds_mapping_type
403
Exception: Step input must be of any type: (<class 'azureml.data.dataset_consumption_config.DatasetConsumptionConfig'>, <class 'azureml.pipeline.core.pipeline_output_dataset.PipelineOutputFileDataset'>, <class 'azureml.pipeline.core.pipeline_output_dataset.PipelineOutputTabularDataset'>, <class 'azureml.data.output_dataset_config.OutputFileDatasetConfig'>, <class 'azureml.data.output_dataset_config.OutputTabularDatasetConfig'>, <class 'azureml.data.output_dataset_config.LinkFileOutputDatasetConfig'>, <class 'azureml.data.output_dataset_config.LinkTabularOutputDatasetConfig'>), found <class 'pandas.core.frame.DataFrame'>
It will be super helpful to let ParallelRunStep class to allow dataframes as inputs.
I understand that ParallelRunStep class only allows the input types - [DatasetConsumptionConfig, PipelineOutputTabularDataset,PipelineOutputTabularDataset, OutputFileDatasetConfig, OutputTabularDatasetConfig, LinkFileOutputDatasetConfig, LinkTabularOutputDatasetConfig]
Is it possible to let dataframes as inputs in ParallelRunStep. Could this be a usecase that Azure ML dev team would consider?