microsoft / solution-accelerator-many-models

MIT License
193 stars 85 forks source link

Exception thrown when dataframes are passed as input to ParallelRunStep class #148

Open manojkumar-github opened 2 years ago

manojkumar-github commented 2 years ago

It will be super helpful to let ParallelRunStep class to allow dataframes as inputs.

I understand that ParallelRunStep class only allows the input types - [DatasetConsumptionConfig, PipelineOutputTabularDataset,PipelineOutputTabularDataset, OutputFileDatasetConfig, OutputTabularDatasetConfig, LinkFileOutputDatasetConfig, LinkTabularOutputDatasetConfig]

Is it possible to let dataframes as inputs in ParallelRunStep. Could this be a usecase that Azure ML dev team would consider?

Exception                                 Traceback (most recent call last)
<ipython-input-27-215e373515cb> in <module>
      7     output=output_dir,
      8     allow_reuse=False,
----> 9     arguments=None
     10 )

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/steps/parallel_run_step.py in __init__(self, name, parallel_run_config, inputs, output, side_inputs, arguments, allow_reuse)
    155             side_inputs=side_inputs,
    156             arguments=arguments,
--> 157             allow_reuse=allow_reuse,
    158         )
    159 

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/core/_parallel_run_step_base.py in __init__(self, name, parallel_run_config, inputs, output, side_inputs, arguments, allow_reuse)
    259 
    260         self._process_inputs_output_dataset_configs()
--> 261         self._validate()
    262         self._get_pystep_inputs()
    263 

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/core/_parallel_run_step_base.py in _validate(self)
    329         """Validate input params to init parallel run step class."""
    330         self._validate_arguments()
--> 331         self._validate_inputs()
    332         self._validate_output()
    333         self._validate_parallel_run_config()

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/core/_parallel_run_step_base.py in _validate_inputs(self)
    410 
    411         if self._inputs:
--> 412             self._input_ds_type = self._get_input_type(self._inputs[0])
    413             for input_ds in self._inputs:
    414                 if self._input_ds_type != self._get_input_type(input_ds):

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/core/_parallel_run_step_base.py in _get_input_type(self, in_ds)
    399             ds_mapping_type = INPUT_TYPE_DICT[input_type]
    400         else:
--> 401             raise Exception("Step input must be of any type: {}, found {}".format(ALLOWED_INPUT_TYPES, input_type))
    402         return ds_mapping_type
    403 

Exception: Step input must be of any type: (<class 'azureml.data.dataset_consumption_config.DatasetConsumptionConfig'>, <class 'azureml.pipeline.core.pipeline_output_dataset.PipelineOutputFileDataset'>, <class 'azureml.pipeline.core.pipeline_output_dataset.PipelineOutputTabularDataset'>, <class 'azureml.data.output_dataset_config.OutputFileDatasetConfig'>, <class 'azureml.data.output_dataset_config.OutputTabularDatasetConfig'>, <class 'azureml.data.output_dataset_config.LinkFileOutputDatasetConfig'>, <class 'azureml.data.output_dataset_config.LinkTabularOutputDatasetConfig'>), found <class 'pandas.core.frame.DataFrame'>