The ParallelRunStep does not terminate anymore / no batches are started

microsoft / MLOpsPython

MLOps using Azure ML Services and Azure DevOps

MIT License

1.2k stars 1.09k forks source link

The ParallelRunStep does not terminate anymore / no batches are started #358

Open sigeisler opened 3 years ago

sigeisler commented 3 years ago

Hi,

I have built upon this project and similarly to your Azure DevOps pipeline my parallel batch scoring pipelines are all not terminating anymore: https://aidemos.visualstudio.com/MLOps/_build/results?buildId=5684&view=logs&j=9effb530-5327-5cf9-9ca2-ba5490ba1ebd

It seems like the actual run(mini_batch) method is never executed.

(I mean you DevOps pipeline is failing as well after 4 hours so I assume you encounter the same issue)

Do you know what's the reason for that?

Thanks!

Sabel5 commented 3 years ago

I'm getting the same problem and I can't find a way to solve it. The ML pipeline starts, all the parallel jobs get created but the mini batches don't do anything and after 55 minutes the process is still running, but no outputs are created. Keen to read Microsoft's response on this.

Sabel5 commented 3 years ago

Apparently the mini_batch method doesn't work for tabular data. Therefore you should try to replace the run function in the batchscoring script with the following:

def run(input_data) -> pd.DataFrame:
    # prediction
    result = None
    for _, sample in input_data.iterrows():
        # prediction
        pred = model.predict(sample.values.reshape(1, -1))
        result = (
            np.array(pred) if result is None else np.vstack((result, pred))
        )  # NOQA: E501
    return (
            []
            if result is None
            else input_data.join(pd.DataFrame(result, columns=["score"]))
        )