sony / nmos-cpp

An NMOS (Networked Media Open Specifications) Registry and Node in C++ (IS-04, IS-05)
Apache License 2.0
136 stars 80 forks source link

On erasing NMOS resources, HTTP DELETE messages are not getting generated #372

Open gsbnmos opened 4 months ago

gsbnmos commented 4 months ago

Hello,

While, NMOS shutting down, I'm calling following functions to delete all the nmos resources before exiting:

nmos::erase_resource(model.node_resources, res_id, true); // res_id is of node model.notify();

While erasing the resources, in Wireshark, I was expecting HTTP DELETE messages going out to Registry and controller(e.g., Riedel). However, I don't see any of these messages in wireshark.

Could you please help, am I missing something in the above code?

Thankyou in advance. Gururaj S.B

gsbnmos commented 4 months ago

Below is the sys-logs captured while exiting:

2024-02-16 18:49:15 User.Warning 10.110.60.75 Feb 16 13:19:26 10.100.50.2 NMOS[1812]: Unable to lock to current reference source 2024-02-16 18:49:15 Daemon.Info 10.110.60.75 Feb 16 13:19:27 10.100.50.2 nmos-cpp-node[6039]: 2024-02-16 13:19:27.191: error: 2657059808: Eraced resources count = 62 2024-02-16 18:49:15 Daemon.Info 10.110.60.75 Feb 16 13:19:27 10.100.50.2 nmos-cpp-node[6039]: 2024-02-16 13:19:27.196: warning: 2673845216: System global configuration is not discoverable 2024-02-16 18:49:15 Daemon.Info 10.110.60.75 Feb 16 13:19:27 10.100.50.2 nmos-cpp-node[6039]: 2024-02-16 13:19:27.199: warning: 2715808736: Stopped registered operation 2024-02-16 18:49:20 Daemon.Info 10.110.60.75 Feb 16 13:19:31 10.100.50.2 nmos-cpp-node[6039]: 2024-02-16 13:19:31.557: info: 3069259792: Closing connections 2024-02-16 18:49:20 Daemon.Info 10.110.60.75 Feb 16 13:19:31 10.100.50.2 nmos-cpp-node[6039]: 2024-02-16 13:19:31.561: info: 2648667104: Error getting remote endpoint: system:9 (Bad file descriptor) 2024-02-16 18:49:20 Daemon.Info 10.110.60.75 Feb 16 13:19:31 10.100.50.2 nmos-cpp-node[6039]: 2024-02-16 13:19:31.561: warning: 2648667104: WebSocket Connection Unknown - "" - 0 websocketpp:26 Operation canceled 2024-02-16 18:49:20 Daemon.Info 10.110.60.75 Feb 16 13:19:31 10.100.50.2 nmos-cpp-node[6039]: 2024-02-16 13:19:31.561: info: 2648667104: asio async_shutdown error: system:9 (Bad file descriptor) 2024-02-16 18:49:20 Daemon.Info 10.110.60.75 Feb 16 13:19:31 10.100.50.2 nmos-cpp-node[6039]: 2024-02-16 13:19:31.561: info: 2648667104: handle_accept error: Operation canceled 2024-02-16 18:49:20 Daemon.Info 10.110.60.75 Feb 16 13:19:31 10.100.50.2 nmos-cpp-node[6039]: 2024-02-16 13:19:31.561: info: 2648667104: Stopping acceptance of new connections because the underlying transport is no longer listening. 2024-02-16 18:49:20 Daemon.Info 10.110.60.75 Feb 16 13:19:31 10.100.50.2 nmos-cpp-node[6039]: 2024-02-16 13:19:31.572: info: 3069259792: Stopping nmos-cpp node 2024-02-16 18:49:20 Daemon.Info 10.110.60.75 Feb 16 13:19:31 10.100.50.2 systemd[1]: Stopping Sony Nmos Daemon.... 2024-02-16 18:49:20 Daemon.Info 10.110.60.75 Feb 16 13:19:31 10.100.50.2 systemd[1]: nmos.service: Succeeded. 2024-02-16 18:49:20 Daemon.Info 10.110.60.75 Feb 16 13:19:31 10.100.50.2 systemd[1]: Stopped Sony Nmos Daemon..

You could see, my log while erasing 62 NMOS resources at second line: 2024-02-16 13:19:27.191: After about 4s, that is 5th line in the logs, at 2024-02-16 13:19:31.557: we see Closing connections.

I was expecting following logs in the sys-logs which is in the node_behavior thread, with the deregistration logic just after the regular registration logic:: slog::log(gate, SLOG_FLF) << "Requesting registration deletion for " << id_type;

garethsb commented 4 months ago

It depends when you erase the resources.

The HTTP DELETE requests are sent to the registry asynchronously with the call to nmos::erase_resource, by the node behaviour thread.

Hope that makes sense. If you want to implement the recommended (but not required) controlled unregistration behaviour then erase the resources before calling model.controlled_shutdown() (or destroying the server).

suddheerA commented 4 months ago

Dear Gareth (@garethsb),

Thank you for your response.

I have created a separate thread named "ShutDownThread" that runs in parallel with the "node_implementation_thread." This thread is responsible for monitoring for shutdown conditions within my application. Upon receiving a shutdown signal, the thread erases the node, subsequently erasing all its child resources. After the erasure is complete, it calls the model.controlled_shutdown() function.

To ensure data integrity during this process, I have implemented a write lock using auto write_lock = model.write_lock(); before initiating the erase operation. The lock is then released after the erasure is complete using write_lock.unlock();.

However, despite these steps, HTTP DELETE messages are not being sent as expected.

I would appreciate your assistance in understanding why these messages are not being triggered and troubleshooting any potential issues. Any insights or suggestions you can provide would be greatly helpful.

Thank you for your time and support.

Sincerely,

garethsb commented 4 months ago

I understand you're doing something like this...

  1. Get lock
  2. Erase node resource
  3. Call notify and release lock
  4. Call controlled shutdown

I think this just still doesn't leave a long enough gap for the Node behaviour thread to empty the queue of events to send to the Registry between 3 and 4.

The simplest change to confirm this would be to put a std::this_thread::sleep_for there. The other approach would be to use a conditional wait on the "grain" containing the queue of events to be empty. It's created by the Node behaviour thread itself but you could find it in the model.

https://github.com/sony/nmos-cpp/blob/62a91659e38828309dc00b32cec55f52d42de0d9/Development/nmos/node_behaviour.cpp#L126

gsbnmos commented 3 months ago

Hi Garethsb, Apologies for the delayed reply. Thanks for the suggestions. This works! Best regards,