microsoft / solution-accelerator-many-models

MIT License
193 stars 85 forks source link

DataFrame column names are stripped when combining timeseries forecast output. #128

Closed felixcollins closed 3 years ago

felixcollins commented 3 years ago

I'm running a timeseries forecast using the many models accelerator and the results come back in a plain text file with whitespace delimited values. There are no headers in the first row. The return value from each mini batch is a pandas DataFrame with columns names. These are being combined to make the output file but the column names are lost.

cartacioS commented 3 years ago

To add header in output automatically, please add below argument when creating the step.

Step = ParallelRunStep( … arguments=["--append_row_dataframe_header", True] … )

deeptim123 commented 3 years ago

If you are using AutoMLPipelineBuilder.get_many_models_batch_inference_steps to build the parallel run step you can pass that as arguments as follows

inference_steps = AutoMLPipelineBuilder.get_many_models_batch_inference_steps( ... arguments=["--append_row_dataframe_header", True] ...)

image

felixcollins commented 3 years ago

Thanks I'll give it a shot and close the ticket when it is going.

felixcollins commented 3 years ago

That worked. Thanks!