microsoft / PowerApps-Samples

Sample code for Power Apps, including Dataverse, model-driven apps, canvas apps, Power Apps component framework, portals, and AI Builder.
https://docs.microsoft.com/powerapps
MIT License
1.53k stars 1.7k forks source link

Pneumonia SetupService.ipynb : Aci Deployment failed with exception: Your container application crashed. This may be caused by errors in your scoring file's init() function #256

Open magafaterr opened 2 years ago

magafaterr commented 2 years ago

Hello @JimDaly and community,

While running the SetupService.ipynb from my Azure workspace, I get the following error message for the LoC below,

LoC image

Error

Updating service pneumonia-detection-onnx Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes. Running 2022-04-05 15:11:01+00:00 Creating Container Registry if not exists. 2022-04-05 15:11:01+00:00 Registering the environment. 2022-04-05 15:11:02+00:00 Use the existing image. 2022-04-05 15:11:02+00:00 Generating deployment configuration. 2022-04-05 15:11:03+00:00 Submitting deployment to compute. 2022-04-05 15:11:05+00:00 Checking the status of deployment pneumonia-detection-onnx.Service deployment polling reached non-successful terminal state, current service state: Failed Operation ID: c0714bf6-edd6-4f8b-97ae-df4ae75d8cfd More information can be found using '.get_logs()' Error: { "code": "AciDeploymentFailed", "statusCode": 400, "message": "Aci Deployment failed with exception: Your container application crashed. This may be caused by errors in your scoring file's init() function.

  1. Please check the logs for your container instance: pneumonia-detection-onnx. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs.
  2. You can interactively debug your scoring file locally. Please refer to https://docs.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.
  3. You can also try to run image c2ca784a051d4215b7af9e26ea9dbfe7.azurecr.io/azureml/azureml_17cae8c4aa5e2efda696b16b3d500c5d locally. Please refer to https://aka.ms/debugimage#service-launch-fails for more information.", "details": [ { "code": "CrashLoopBackOff", "message": "Your container application crashed. This may be caused by errors in your scoring file's init() function.
  4. Please check the logs for your container instance: pneumonia-detection-onnx. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs.
  5. You can interactively debug your scoring file locally. Please refer to https://docs.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.
  6. You can also try to run image c2ca784a051d4215b7af9e26ea9dbfe7.azurecr.io/azureml/azureml_17cae8c4aa5e2efda696b16b3d500c5d locally. Please refer to https://aka.ms/debugimage#service-launch-fails for more information." }, { "code": "AciDeploymentFailed", "message": "Your container application crashed. Please follow the steps to debug:
  7. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs. Please refer to https://aka.ms/debugimage#dockerlog for more information.
  8. If your container application crashed. This may be caused by errors in your scoring file's init() function. You can try debugging locally first. Please refer to https://aka.ms/debugimage#debug-locally for more information.
  9. You can also interactively debug your scoring file locally. Please refer to https://docs.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.
  10. View the diagnostic events to check status of container, it may help you to debug the issue. "RestartCount": 3 "CurrentState": {"state":"Waiting","startTime":null,"exitCode":null,"finishTime":null,"detailStatus":"CrashLoopBackOff: Back-off restarting failed"} "PreviousState": {"state":"Terminated","startTime":"2022-04-05T15:21:29.431Z","exitCode":111,"finishTime":"2022-04-05T15:21:35.873Z","detailStatus":"Error"} "Events": null " } ] }

WebserviceException Traceback (most recent call last)

in 17 service = Model.deploy(ws, aci_service_name, [model], inference_config, deployment_config) 18 ---> 19 service.wait_for_deployment(True) 20 print(service.state) /anaconda/envs/azureml_py38/lib/python3.8/site-packages/azureml/core/webservice/webservice.py in wait_for_deployment(self, show_output, timeout_sec) 917 logs_response = 'Current sub-operation type not known, more logs unavailable.' 918 --> 919 raise WebserviceException('Service deployment polling reached non-successful terminal state, current ' 920 'service state: {}\n' 921 'Operation ID: {}\n' WebserviceException: WebserviceException: Message: Service deployment polling reached non-successful terminal state, current service state: Failed Operation ID: c0714bf6-edd6-4f8b-97ae-df4ae75d8cfd More information can be found using '.get_logs()' Error: { "code": "AciDeploymentFailed", "statusCode": 400, "message": "Aci Deployment failed with exception: Your container application crashed. This may be caused by errors in your scoring file's init() function. 1. Please check the logs for your container instance: pneumonia-detection-onnx. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs. 2. You can interactively debug your scoring file locally. Please refer to https://docs.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information. 3. You can also try to run image c2ca784a051d4215b7af9e26ea9dbfe7.azurecr.io/azureml/azureml_17cae8c4aa5e2efda696b16b3d500c5d locally. Please refer to https://aka.ms/debugimage#service-launch-fails for more information.", "details": [ { "code": "CrashLoopBackOff", "message": "Your container application crashed. This may be caused by errors in your scoring file's init() function. 1. Please check the logs for your container instance: pneumonia-detection-onnx. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs. 2. You can interactively debug your scoring file locally. Please refer to https://docs.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information. 3. You can also try to run image c2ca784a051d4215b7af9e26ea9dbfe7.azurecr.io/azureml/azureml_17cae8c4aa5e2efda696b16b3d500c5d locally. Please refer to https://aka.ms/debugimage#service-launch-fails for more information." }, { "code": "AciDeploymentFailed", "message": "Your container application crashed. Please follow the steps to debug: 1. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs. Please refer to https://aka.ms/debugimage#dockerlog for more information. 2. If your container application crashed. This may be caused by errors in your scoring file's init() function. You can try debugging locally first. Please refer to https://aka.ms/debugimage#debug-locally for more information. 3. You can also interactively debug your scoring file locally. Please refer to https://docs.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information. 4. View the diagnostic events to check status of container, it may help you to debug the issue. "RestartCount": 3 "CurrentState": {"state":"Waiting","startTime":null,"exitCode":null,"finishTime":null,"detailStatus":"CrashLoopBackOff: Back-off restarting failed"} "PreviousState": {"state":"Terminated","startTime":"2022-04-05T15:21:29.431Z","exitCode":111,"finishTime":"2022-04-05T15:21:35.873Z","detailStatus":"Error"} "Events": null " } ] } InnerException None ErrorResponse { "error": { "message": "Service deployment polling reached non-successful terminal state, current service state: Failed\nOperation ID: c0714bf6-edd6-4f8b-97ae-df4ae75d8cfd\nMore information can be found using '.get_logs()'\nError:\n{\n \"code\": \"AciDeploymentFailed\",\n \"statusCode\": 400,\n \"message\": \"Aci Deployment failed with exception: Your container application crashed. This may be caused by errors in your scoring file's init() function.\n\t1. Please check the logs for your container instance: pneumonia-detection-onnx. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs.\n\t2. You can interactively debug your scoring file locally. Please refer to https://docs.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.\n\t3. You can also try to run image c2ca784a051d4215b7af9e26ea9dbfe7.azurecr.io/azureml/azureml_17cae8c4aa5e2efda696b16b3d500c5d locally. Please refer to https://aka.ms/debugimage#service-launch-fails for more information.\",\n \"details\": [\n {\n \"code\": \"CrashLoopBackOff\",\n \"message\": \"Your container application crashed. This may be caused by errors in your scoring file's init() function.\n\t1. Please check the logs for your container instance: pneumonia-detection-onnx. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs.\n\t2. You can interactively debug your scoring file locally. Please refer to https://docs.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.\n\t3. You can also try to run image c2ca784a051d4215b7af9e26ea9dbfe7.azurecr.io/azureml/azureml_17cae8c4aa5e2efda696b16b3d500c5d locally. Please refer to https://aka.ms/debugimage#service-launch-fails for more information.\"\n },\n {\n \"code\": \"AciDeploymentFailed\",\n \"message\": \"Your container application crashed. Please follow the steps to debug:\n\t1. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs. Please refer to https://aka.ms/debugimage#dockerlog for more information.\n\t2. If your container application crashed. This may be caused by errors in your scoring file's init() function. You can try debugging locally first. Please refer to https://aka.ms/debugimage#debug-locally for more information.\n\t3. You can also interactively debug your scoring file locally. Please refer to https://docs.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.\n\t4. View the diagnostic events to check status of container, it may help you to debug the issue.\n\"RestartCount\": 3\n\"CurrentState\": {\"state\":\"Waiting\",\"startTime\":null,\"exitCode\":null,\"finishTime\":null,\"detailStatus\":\"CrashLoopBackOff: Back-off restarting failed\"}\n\"PreviousState\": {\"state\":\"Terminated\",\"startTime\":\"2022-04-05T15:21:29.431Z\",\"exitCode\":111,\"finishTime\":\"2022-04-05T15:21:35.873Z\",\"detailStatus\":\"Error\"}\n\"Events\": null\n\"\n }\n ]\n}" } }. 2022-04-05 15:20:40+00:00 Checking the status of inference endpoint pneumonia-detection-onnx. Failed
JimDaly commented 2 years ago

@magafaterr Which sample are you trying to run?

magafaterr commented 2 years ago

This one > https://github.com/microsoft/PowerApps-Samples/tree/master/ai-builder/BringYourOwnModelTutorial I just cloned the repository yesterday.

JimDaly commented 2 years ago

@JoeFernandezMS Are you able to help with this?

JoeFernandezMS commented 2 years ago

Thanks Jim - I believe @shankarrk should be able to help on this one.

JimDaly commented 2 years ago

@shankak We can discuss this internally. Lets understand if there is a problem with the sample code and get a fix applied. Keeping this open for now until we determine if a change is required for the sample code.

JimDaly commented 2 years ago

@shankak Please take a look at this.

iamramengirl commented 2 years ago

Hi, any updates on this? I ran into the same problem. I debugged the script.py file and changed the file path to point to model path in my workspace. I found that the error occurs in the init() function, specifically in loading the onnx model via the onnx runtime.

image

iamramengirl commented 2 years ago

Hi @shankak @JimDaly Are there any updates to this?

iamramengirl commented 2 years ago

I've checked this issue and tried the suggested workaround https://github.com/microsoft/PowerApps-Samples/issues/231

I downgraded the azureml-core package to 1.38.0 and added pandas in the environment yml file and it resolved the ACI deployment issue related to onnxruntime.