Open sigeisler opened 3 years ago
I'm getting the same problem and I can't find a way to solve it. The ML pipeline starts, all the parallel jobs get created but the mini batches don't do anything and after 55 minutes the process is still running, but no outputs are created. Keen to read Microsoft's response on this.
Apparently the mini_batch method doesn't work for tabular data. Therefore you should try to replace the run function in the batchscoring script with the following:
def run(input_data) -> pd.DataFrame:
# prediction
result = None
for _, sample in input_data.iterrows():
# prediction
pred = model.predict(sample.values.reshape(1, -1))
result = (
np.array(pred) if result is None else np.vstack((result, pred))
) # NOQA: E501
return (
[]
if result is None
else input_data.join(pd.DataFrame(result, columns=["score"]))
)
Hi,
I have built upon this project and similarly to your Azure DevOps pipeline my parallel batch scoring pipelines are all not terminating anymore: https://aidemos.visualstudio.com/MLOps/_build/results?buildId=5684&view=logs&j=9effb530-5327-5cf9-9ca2-ba5490ba1ebd
It seems like the actual
run(mini_batch)
method is never executed.(I mean you DevOps pipeline is failing as well after 4 hours so I assume you encounter the same issue)
Do you know what's the reason for that?
Thanks!