project-akri / akri

A Kubernetes Resource Interface for the Edge
https://docs.akri.sh/
Apache License 2.0
1.1k stars 142 forks source link

DiscoveryHandler is called for the first time after a minute from agent #563

Closed mregen closed 2 months ago

mregen commented 1 year ago

Describe the bug

After updating a helm chart with new configuration that requires a new start of the discovery handler, it takes one minute after registration until the first discover call is received. Can this delay be configured? If not, can you make it configurable?

Output of kubectl get pods,akrii,akric -o wide

NAME                                              READY   STATUS    RESTARTS         AGE     IP          NODE             NOMINATED NODE   READINESS GATES
pod/akri-agent-daemonset-jrx9t                    1/1     Running   28               5h55m   10.1.0.87   docker-desktop   <none>           <none>
pod/akri-controller-deployment-57c5dc7dc5-b5qjg   1/1     Running   28 (3m42s ago)   5h55m   10.1.0.86   docker-desktop   <none>           <none>
pod/akri-opcua-asset-discovery-daemonset-kr8gv    1/1     Running   0                9m59s   10.1.0.89   docker-desktop   <none>           <none>

NAME                                       CONFIG             SHARED   NODES                AGE
instance.akri.sh/akri-opcua-asset-6bc184   akri-opcua-asset   true     ["docker-desktop"]   2d6h
instance.akri.sh/akri-opcua-asset-6c9e9d   akri-opcua-asset   true     ["docker-desktop"]   11d
instance.akri.sh/akri-opcua-asset-fe7a3d   akri-opcua-asset   true     ["docker-desktop"]   11d

NAME                                     CAPACITY   AGE
configuration.akri.sh/akri-opcua-asset   1          12d

Kubernetes Version: [K8s]

To Reproduce

Steps to reproduce the behavior:

  1. Create cluster using '...'
  2. Install Akri with the Helm command '...'
  3. Start discovery handler with Helm command

Expected behavior

Delay from registration to first discover call is in the seconds range after registration or configurable.

Logs (please share snips of applicable logs)

2023-02-28 16:39:05.151 +00:00 info: OpcUaDetection.Akri.Program[0]      Akri OPC UA Detection (0.2.0-
2023-02-28 16:39:05.343 +00:00 info: OpcUaDetection.Akri.Program[0]      Got IP address of the pod from POD_IP environment variable.
2023-02-28 16:39:06.248 +00:00 info: OpcUaDetection.Akri.Program[0]      Registered with Akri system with Name opcua-asset for http://10.1.0.89:80 with type: Network as shared: True
2023-02-28 16:39:07.545 +00:00 info: Microsoft.Hosting.Lifetime[14]      Now listening on: http://[::]:80
2023-02-28 16:39:07.545 +00:00 info: Microsoft.Hosting.Lifetime[0]      Application started. Press Ctrl+C to shut down.
2023-02-28 16:39:07.545 +00:00 info: Microsoft.Hosting.Lifetime[0]      Hosting environment: Production
2023-02-28 16:39:07.545 +00:00 info: Microsoft.Hosting.Lifetime[0]      Content root path: /app/

---> 1 minute delay until first discovery call

2023-02-28 16:40:06.858 +00:00 info: OpcUa.Common.Client.OpcUaCore[0]       Checking application instance certificate.
2023-02-28 16:40:06.945 +00:00 info: OpcUa.Common.Client.OpcUaCore[0]      Creating application instance certificate.
2023-02-28 16:40:07.647 +00:00 info: OpcUa.Common.Client.OpcUaCore[512]      Imported the PFX private key for [3B0B2034BF6813194C6AF36A28A4C9F4AF16C058].
2023-02-28 16:40:07.742 +00:00 info: OpcUa.Common.Client.OpcUaCore[0]      Certificate created for urn:microsoft.com:ua:assetdetection. [CN=OpcUaDetection, O=Microsoft] [3B0B2034BF6813194C6AF36A28A4C9F4AF16C058]
2023-02-28 16:40:07.748 +00:00 info: OpcUaDetection.Akri.Services.DiscoveryHandlerService[0]
      Got discover request opcuaDiscoveryMethod:
        asset:
          endpointUrl: "opc.tcp://XXXXX:4840"
       from ipv6:[::ffff:10.1.0.87]:56096
github-actions[bot] commented 1 year ago

Issue has been automatically marked as stale due to inactivity for 90 days. Update the issue to remove label, otherwise it will be automatically closed.

github-actions[bot] commented 1 year ago

Issue has been automatically marked as stale due to inactivity for 90 days. Update the issue to remove label, otherwise it will be automatically closed.

johnsonshih commented 12 months ago

The delay comes from do_discover_on_discovery_handler, if get_stream returns None. Agent waits for 60 seconds, 60 seconds is hard-coded. // If a connection cannot be established with the Discovery Handler, it will sleep and try again.

github-actions[bot] commented 9 months ago

Issue has been automatically marked as stale due to inactivity for 90 days. Update the issue to remove label, otherwise it will be automatically closed.

github-actions[bot] commented 5 months ago

Issue has been automatically marked as stale due to inactivity for 90 days. Update the issue to remove label, otherwise it will be automatically closed.