surfriderfoundationeurope / mot

Multi-Object-Tracking for garbage detection
MIT License
26 stars 14 forks source link

JSON response tbshoot #44

Open cl3m3nt opened 3 years ago

cl3m3nt commented 3 years ago

Dear mot developper,

when using the AI inference service, it seems that AI JSON format response from the server might be inconsistent over the time. The error is not deterministically reproductible, however, it happens over the time for the same video sent to the AI server. Although below error message is related to ETL API service, the error might come from an issue in AI response format. Kindly asking whether we could have a look at both service interaction to identify where this could from.

AI inference service is available fail: Function.etlHttpTrigger[3] Executed 'Functions.etlHttpTrigger' (Failed, Id=15b7c228-e4f2-4acc-b5b5-d7f5487301a6, Duration=69740ms) Microsoft.Azure.WebJobs.Host.FunctionInvocationException: Exception while executing function: Functions.etlHttpTrigger ---> Microsoft.Azure.WebJobs.Script.Workers.Rpc.RpcException: Result: Failure Exception: JSONDecodeError: Invalid \escape: line 1 column 94 (char 93) Stack: File "/azure-functions-host/workers/python/3.7/LINUX/X64/azure_functions_worker/dispatcher.py", line 357, in _handleinvocation_request self.__run_sync_func, invocation_id, fi.func, args) File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run result = self.fn(*self.args, **self.kwargs) File "/azure-functions-host/workers/python/3.7/LINUX/X64/azure_functions_worker/dispatcher.py", line 542, in run_sync_func return func(**params) File "/home/site/wwwroot/etlHttpTrigger/init.py", line 161, in main json_prediction = get_json_prediction(prediction) File "/home/site/wwwroot/etlHttpTrigger/utils/ai.py", line 62, in get_json_prediction json_prediction = json.loads(string_prediction) File "/usr/local/lib/python3.7/json/init.py", line 348, in loads return _default_decoder.decode(s) File "/usr/local/lib/python3.7/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/local/lib/python3.7/json/decoder.py", line 353, in raw_decode obj, end = self.scan_once(s, idx)

at Microsoft.Azure.WebJobs.Script.Description.WorkerFunctionInvoker.InvokeCore(Object[] parameters, FunctionInvocationContext context) in /src/azure-functions-host/src/WebJobs.Script/Description/Workers/WorkerFunctionInvoker.cs:line 93 at Microsoft.Azure.WebJobs.Script.Description.FunctionInvokerBase.Invoke(Object[] parameters) in /src/azure-functions-host/src/WebJobs.Script/Description/FunctionInvokerBase.cs:line 82 at Microsoft.Azure.WebJobs.Script.Description.FunctionGenerator.Coerce[T](Task1 src) in /src/azure-functions-host/src/WebJobs.Script/Description/FunctionGenerator.cs:line 225 at Microsoft.Azure.WebJobs.Host.Executors.FunctionInvoker2.InvokeAsync(Object instance, Object[] arguments) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionInvoker.cs:line 52 at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.InvokeWithTimeoutAsync(IFunctionInvoker invoker, ParameterHelper parameterHelper, CancellationTokenSource timeoutTokenSource, CancellationTokenSource functionCancellationTokenSource, Boolean throwOnTimeout, TimeSpan timerInterval, IFunctionInstance instance) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:line 559 at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.ExecuteWithWatchersAsync(IFunctionInstanceEx instance, ParameterHelper parameterHelper, ILogger logger, CancellationTokenSource functionCancellationTokenSource) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:line 507 at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.ExecuteWithLoggingAsync(IFunctionInstanceEx instance, FunctionStartedMessage message, FunctionInstanceLogEntry instanceLogEntry, ParameterHelper parameterHelper, ILogger logger, CancellationToken cancellationToken) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:line 285 --- End of inner exception stack trace --- at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.ExecuteWithLoggingAsync(IFunctionInstanceEx instance, FunctionStartedMessage message, FunctionInstanceLogEntry instanceLogEntry, ParameterHelper parameterHelper, ILogger logger, CancellationToken cancellationToken) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:line 332 at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.TryExecuteAsync(IFunctionInstance functionInstance, CancellationToken cancellationToken) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:line 94 fail: Host.Results[0] Microsoft.Azure.WebJobs.Host.FunctionInvocationException: Exception while executing function: Functions.etlHttpTrigger ---> Microsoft.Azure.WebJobs.Script.Workers.Rpc.RpcException: Result: Failure Exception: JSONDecodeError: Invalid \escape: line 1 column 94 (char 93) Stack: File "/azure-functions-host/workers/python/3.7/LINUX/X64/azure_functions_worker/dispatcher.py", line 357, in _handleinvocation_request self.__run_sync_func, invocation_id, fi.func, args) File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run result = self.fn(*self.args, **self.kwargs) File "/azure-functions-host/workers/python/3.7/LINUX/X64/azure_functions_worker/dispatcher.py", line 542, in run_sync_func return func(**params) File "/home/site/wwwroot/etlHttpTrigger/init.py", line 161, in main json_prediction = get_json_prediction(prediction) File "/home/site/wwwroot/etlHttpTrigger/utils/ai.py", line 62, in get_json_prediction json_prediction = json.loads(string_prediction) File "/usr/local/lib/python3.7/json/init.py", line 348, in loads return _default_decoder.decode(s) File "/usr/local/lib/python3.7/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/local/lib/python3.7/json/decoder.py", line 353, in raw_decode obj, end = self.scan_once(s, idx)

at Microsoft.Azure.WebJobs.Script.Description.WorkerFunctionInvoker.InvokeCore(Object[] parameters, FunctionInvocationContext context) in /src/azure-functions-host/src/WebJobs.Script/Description/Workers/WorkerFunctionInvoker.cs:line 93 at Microsoft.Azure.WebJobs.Script.Description.FunctionInvokerBase.Invoke(Object[] parameters) in /src/azure-functions-host/src/WebJobs.Script/Description/FunctionInvokerBase.cs:line 82 at Microsoft.Azure.WebJobs.Script.Description.FunctionGenerator.Coerce[T](Task1 src) in /src/azure-functions-host/src/WebJobs.Script/Description/FunctionGenerator.cs:line 225 at Microsoft.Azure.WebJobs.Host.Executors.FunctionInvoker2.InvokeAsync(Object instance, Object[] arguments) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionInvoker.cs:line 52 at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.InvokeWithTimeoutAsync(IFunctionInvoker invoker, ParameterHelper parameterHelper, CancellationTokenSource timeoutTokenSource, CancellationTokenSource functionCancellationTokenSource, Boolean throwOnTimeout, TimeSpan timerInterval, IFunctionInstance instance) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:line 559 at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.ExecuteWithWatchersAsync(IFunctionInstanceEx instance, ParameterHelper parameterHelper, ILogger logger, CancellationTokenSource functionCancellationTokenSource) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:line 507 at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.ExecuteWithLoggingAsync(IFunctionInstanceEx instance, FunctionStartedMessage message, FunctionInstanceLogEntry instanceLogEntry, ParameterHelper parameterHelper, ILogger logger, CancellationToken cancellationToken) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:line 285 --- End of inner exception stack trace --- at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.ExecuteWithLoggingAsync(IFunctionInstanceEx instance, FunctionStartedMessage message, FunctionInstanceLogEntry instanceLogEntry, ParameterHelper parameterHelper, ILogger logger, CancellationToken cancellationToken) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:line 332 at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.TryExecuteAsync(IFunctionInstance functionInstance, CancellationToken cancellationToken) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:line 94

charlesollion commented 3 years ago

On it! Could you please provide a working and failing JSON?

cl3m3nt commented 3 years ago

Trying to generate working and non working JSON. From now on, testing on single ETL and singe AI VM would not allow to reproduce the error. I will test this afternoon on multiple VM set up, as the error might come from the AI beeing oversubscribed. I let you know as soon as I am able to get result and associated JSON, but in the case of oversubscription being the root cause, this would mean that the multi proc capability of AI should not be used at all (related to Multiprocessing issue #43)

cl3m3nt commented 3 years ago

After additional testing by sending 2 x concurrent request to the AI, I got the following error message:

Function.etlHttpTrigger.User[0] Request to AI failed wih reason INTERNAL SERVER ERROR. info: Function.etlHttpTrigger.User[0] [b'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">\n500 Internal Server Error\n

Internal Server Error

\n

The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.

\n']

I guess the ETL error related to JSON decode helper function is due to this Internal error 500. When trying to decode a JSON from the response request, the ETL failed as it tries to decode the error message. Waiting for you comment but this is probably due to oversubscription of the pool, so we might close this issue as oversubscription is not "officially" supported by AI

cl3m3nt commented 3 years ago

New update after testing as I also got a new error type of response:

[b'{"error":"Tensorflow serving returned an error. It probably means that your SavedModel doesn\'t have the correct signature_def, which must be \'serving_default\'. You can inspect that by doing saved_model_cli show --dir /path/to/your/model --all. If the signature_def isn\'t \'serving_default\' you should export your checkpoints following the instructions in the main README, section Export Here is the original error raised by tensorflow serving:\n Timed out waiting for notification"}\n']

Still this can't be decode by ETL as JSON :)

charlesollion commented 3 years ago

Does this happen with a single AI, and a single call? The tensorflow serving error could mean the service is not properly initialized

cl3m3nt commented 3 years ago

I believe this only happens with multiple calls over an AI server.