smart-edge-open / edgeapps

Applications that can be onboarded to an Intel® Smart Edge Open edge node.
Apache License 2.0
51 stars 73 forks source link

openvino sample code - openvino-cons-app doesn't discovery the services and then get notification from openvino-prod-app succesffuly. #43

Closed adtrytech closed 3 years ago

adtrytech commented 3 years ago

Hi Team,

I'm trying to onboard the openvino sample code to the OpenNESS cluster. But, I'm failed to start the object inferences.

It seems that the openvino-cons-app doesn't discovery the services and then get notification(setting the inference algorithms) from the openvino-prod-app. So, it can't start the inference task once they get video packets from the client simulation side. I attach these two APPs's logs and hope some one can help me on this. Is there anything I should check in advance?

Producer APP: [root@cn01 ~]# kubectl logs -f openvino-prod-app-787bf68fdc-79jqf go: finding github.com/pkg/errors v0.9.1 go: downloading github.com/pkg/errors v0.9.1 go: extracting github.com/pkg/errors v0.9.1 2021/02/08 06:21:23 OpenVINO Producer Application Started 2021/02/08 06:21:23 Create Encrypted client 2021/02/08 06:21:23 Loading certificate and key 2021/02/08 06:21:23 &http.Client{Transport:(http.Transport)(0xc000120280), CheckRedirect:(func(http.Request, []*http.Request) error)(nil), Jar:http.CookieJar(nil), Timeout:0} 2021/02/08 06:21:23 Service-activation request failed: Post https://eaa.openness:443/services: dial tcp 10.16.0.51:443: connect: connection refused 2021/02/08 06:21:23 Service-activation request failed: Post https://eaa.openness:443/notifications: dial tcp 10.16.0.51:443: connect: connection refused 2021/02/08 06:21:23 Inference settings: {pedestrian-detection-adas-0002 CPU} 2021/02/08 06:22:23 Service-activation request failed: Post https://eaa.openness:443/notifications: dial tcp 10.16.0.51:443: connect: connection refused 2021/02/08 06:22:23 Inference settings: {vehicle-detection-adas-0002 CPU} 2021/02/08 06:23:23 Inference settings: {pedestrian-detection-adas-0002 CPU} 2021/02/08 06:24:23 Inference settings: {vehicle-detection-adas-0002 CPU} 2021/02/08 06:25:23 Inference settings: {pedestrian-detection-adas-0002 CPU} 2021/02/08 06:26:23 Inference settings: {vehicle-detection-adas-0002 CPU} 2021/02/08 06:27:23 Inference settings: {pedestrian-detection-adas-0002 CPU} 2021/02/08 06:28:23 Inference settings: {vehicle-detection-adas-0002 CPU} 2021/02/08 06:29:23 Inference settings: {pedestrian-detection-adas-0002 CPU} 2021/02/08 06:30:23 Inference settings: {vehicle-detection-adas-0002 CPU} 2021/02/08 06:31:23 Inference settings: {pedestrian-detection-adas-0002 CPU} 2021/02/08 06:32:23 Inference settings: {vehicle-detection-adas-0002 CPU} 2021/02/08 06:33:23 Inference settings: {pedestrian-detection-adas-0002 CPU} 2021/02/08 06:34:23 Inference settings: {vehicle-detection-adas-0002 CPU} 2021/02/08 06:35:23 Inference settings: {pedestrian-detection-adas-0002 CPU}

Consumer APP: [root@cn01 openvino]# kubectl logs -f openvino-cons-app-6744f45755-hqppx [setupvars.sh] OpenVINO environment initialized go: downloading github.com/gorilla/websocket v1.4.2 2021/02/08 06:24:47 OpenVINO Consumer Started 2021/02/08 06:24:47 Create Encrypted client 2021/02/08 06:24:47 Loading certificate and key 2021/02/08 06:24:47 &http.Client{Transport:(http.Transport)(0xc000190280), CheckRedirect:(func(http.Request, []*http.Request) error)(nil), Jar:http.CookieJar(nil), Timeout:0} 2021/02/08 06:24:47 Establish websocket Started 2021/02/08 06:24:47 WebSocket establishment successful 2021/02/08 06:24:47 Service Discovery Started

cjnolan commented 3 years ago

Hi @adtrytech, could you check if the EAA is running in the cluster please? Looking at the output from the producer app, it is failing to connect to the EAA during the Service creation stage.

adtrytech commented 3 years ago

Hi cjnolan,

Thanks for your kindly help. The EAA is running in the openness namespace. The error seems caused by rebooting the OpenNESS cluster.

After start the consumer app, the EAA logs some status as below and should it subscribe the ns_openvino, right? (After tracing the logs, it shows "Added Subscriber for a topic: ns_default")

[watermill] 2021/02/09 00:55:56.305002 subscriber.go:139: level=INFO msg="Subscribing to Kafka topic" consumer_group=EAA_294e5390-8c95-4acc-99cd-710695a1a89f kafka_consumer_uuid=CcwFe5zSYpwJCpuFx52xWA provider=kafka subscriber_uuid=TT3rz9tw45Jatr9K8NcdQL topic=client_openvino.consumer [watermill] 2021/02/09 00:55:56.305025 subscriber.go:210: level=INFO msg="Starting consuming" consumer_group=EAA_294e5390-8c95-4acc-99cd-710695a1a89f kafka_consumer_uuid=CcwFe5zSYpwJCpuFx52xWA provider=kafka subscriber_uuid=TT3rz9tw45Jatr9K8NcdQL topic=client_openvino.consumer

<134>Feb 9 00:55:56 eaa[1]: [eaa] Added Subscriber for a topic: client_openvino.consumer <134>Feb 9 00:55:56 eaa[1]: [eaa] handleClientUpdates() starts [watermill] 2021/02/09 00:55:56.339074 subscriber.go:139: level=INFO msg="Subscribing to Kafka topic" consumer_group=EAA_294e5390-8c95-4acc-99cd-710695a1a89f kafka_consumer_uuid=4AUTZXtmUEsCP5bbaZYWJ9 provider=kafka subscriber_uuid=Fov2brfa2n5AaiL4jJ3K75 topic=ns_default [watermill] 2021/02/09 00:55:56.339100 subscriber.go:210: level=INFO msg="Starting consuming" consumer_group=EAA_294e5390-8c95-4acc-99cd-710695a1a89f kafka_consumer_uuid=4AUTZXtmUEsCP5bbaZYWJ9 provider=kafka subscriber_uuid=Fov2brfa2n5AaiL4jJ3K75 topic=ns_default <134>Feb 9 00:55:56 eaa[1]: [eaa] Added Subscriber for a topic: ns_default <134>Feb 9 00:55:56 eaa[1]: [eaa] Added Publisher for a topic: client_openvino.consumer <134>Feb 9 00:55:56 eaa[1]: [eaa] handleNotificationUpdates() starts
adtrytech commented 3 years ago

Hi cjnolan,

I try to re-create the prod and cons apps and provide the logs as below. In this test, the prod send out the notification but the cons never receive them. There is also some strange error message in the EAA. Hope these information can help to clarify the issue. Thank you again!

// Prod app [root@cn01 openvino]# kubectl logs -f openvino-prod-app-787bf68fdc-nwr94 go: finding github.com/pkg/errors v0.9.1 go: downloading github.com/pkg/errors v0.9.1 go: extracting github.com/pkg/errors v0.9.1 2021/02/09 01:02:17 OpenVINO Producer Application Started 2021/02/09 01:02:17 Create Encrypted client 2021/02/09 01:02:17 Loading certificate and key 2021/02/09 01:02:17 &http.Client{Transport:(http.Transport)(0xc00010c280), CheckRedirect:(func(http.Request, []*http.Request) error)(nil), Jar:http.CookieJar(nil), Timeout:0} 2021/02/09 01:02:17 Service-activation request sent to the server 2021/02/09 01:02:17 Inference settings: {pedestrian-detection-adas-0002 CPU} 2021/02/09 01:03:17 Inference settings: {vehicle-detection-adas-0002 CPU}

// EAA

<134>Feb 9 00:59:44 eaa[1]: [eaa] Heartbeat <134>Feb 9 01:00:44 eaa[1]: [eaa] Heartbeat <134>Feb 9 01:01:44 eaa[1]: [eaa] Heartbeat <134>Feb 9 01:02:17 eaa[1]: [eaa] Successfully added 'openvino:producer' service <134>Feb 9 01:02:44 eaa[1]: [eaa] Heartbeat <134>Feb 9 01:02:57 eaa[1]: [eaa] Failed to send close message to old connection <134>Feb 9 01:02:57 eaa[1]: [eaa] Failed to close previous websocket connection <134>Feb 9 01:03:44 eaa[1]: [eaa] Heartbeat // Cons app [setupvars.sh] OpenVINO environment initialized go: downloading github.com/gorilla/websocket v1.4.2 2021/02/09 01:02:57 OpenVINO Consumer Started 2021/02/09 01:02:57 Create Encrypted client 2021/02/09 01:02:57 Loading certificate and key 2021/02/09 01:02:57 &http.Client{Transport:(*http.Transport)(0xc000190280), CheckRedirect:(func(*http.Request, []*http.Request) error)(nil), Jar:http.CookieJar(nil), Timeout:0} 2021/02/09 01:02:57 Establish websocket Started 2021/02/09 01:02:57 WebSocket establishment successful 2021/02/09 01:02:57 Service Discovery Started 2021/02/09 01:02:57 Subscribed to notification: openvino-inference 1.0.0 2021/02/09 01:02:57 Subscribed to notification: terminate 1.0.0
adtrytech commented 3 years ago

Hi @cjnolan,

Does the openvino prod and cons APP need the "analytics-ffmpeg" and "analytics-gstreamer" pods to co-operate?

Thanks again!

cjnolan commented 3 years ago

Hi @adtrytech, it looks like the EAA failed to remove the old websocket connection to the previous consumer deployment. This is preventing a new connection with the new consumer application deployment. Could you try redeploying the EAA then deploy the sample application pods?

adtrytech commented 3 years ago

Hi cjnolan,

Thanks for your advice. Following your instruction(I restarted the cluster), I can launch the sample application successfully. But, it seems not always OK for my. Sometimes, I still got failed in deploying the sample applications.

Is there any way to clear the registered service in the EAA?

Thank you!

cjnolan commented 3 years ago

Hi @adtrytech,

The sample application is configured to remove the service in the EAA using the EAA API once both applications have completed. If both sample applications run without issue and complete, then the EAA should be cleared before the application pods are stopped.

adtrytech commented 3 years ago

Hi cjnolan,

Thanks again. I have one more question about how to terminate the sample applications correctly. Delete the pods directly? or other methods?

Regards.

cjnolan commented 3 years ago

Hi @adtrytech,

If the sample pods are allowed to run the application to completion without failure or the pod being deleted manually while running, then both the producer and consumer pods will send calls to the EAA to remove the service entries in its list before the application exits and the pods stop.

adtrytech commented 3 years ago

Hi cjnolan

Thanks for your detailed information. So far, I have no more question of this topic. Thank you!

cjnolan commented 3 years ago

Closing as issue has been addressed