callbacks on natsConnection are called after natsConnection_destroy

thierryba commented 5 months ago

Observed behavior

callbacks, especially ClosedCB and DisconnectedCB are called after the call to natsConnection_Destroy

Expected behavior

Either those should not get called or there should be a way to silence those.

Server and client version

I used the nats.c 3.8.0 connecting to demo.nats.io (see the code).

Host environment

No response

Steps to reproduce

I wrote a little c++ example program. It has a class called MessagingController. I took advantage of the fact that we ca set a closure on the callbacks to pass a pointer to that object so that in the callbacks it can redirect the calls to it. That works really well except when I destruct my MessagingController instance. In that case, some callbacks are called after the destructor of that object. I believe my program uses a very common pattern for encapsulating the nats connection functionality. But there is no easy way to prevent callbacks on my MEssagingController instance after the destructor. In my real code, I ended up putting all the "valid" MessagingController instance in a static set so that I can check for validity there.

I would be very grateful for any insight on this one.

Here is the code (c++17):

#include <nats/nats.h>
#include <memory>
#include <iostream>

template<class T> using UniquePtr = std::unique_ptr<T, void (*)(T*)>;

namespace {
void closedCB(natsConnection*, void* closure)
{
    std::cout << __func__ << ": " << closure << std::endl;
}

void connectedStatusChangedCB(natsConnection*, void* closure)
{
    std::cout << __func__ << ": " << closure << std::endl;
}
}

class MessagingController
{
public:
    MessagingController()
    {
        std::cout << __func__<< ": " << this << std::endl;

        //connect
        UniquePtr<natsOptions> opts{nullptr, nullptr};
        {
            natsOptions* opts_ = nullptr;
            if (natsStatus ns = natsOptions_Create(&opts_); ns == NATS_OK) {
                opts = {opts_, natsOptions_Destroy};
            } else {
                throw 1;
            }

            natsOptions_SetClosedCB(opts.get(), closedCB, this);
            natsOptions_SetDisconnectedCB(opts.get(), connectedStatusChangedCB, this);

            static const char *server = "demo.nats.io";
            natsOptions_SetServers(opts.get(), &server, 1);

            natsConnection *conn = nullptr;

            if (natsConnection_Connect(&conn, opts.get()); conn) {
                connection_ = {conn, natsConnection_Destroy};
            } else {
                throw 2;
            }
        }
    }

    ~MessagingController()
    {
        std::cout << __func__<< ": " << this << std::endl;
    }

private:
    UniquePtr<natsConnection> connection_{nullptr, nullptr};
};

int main()
{
    try {
        MessagingController controller;
    } catch(int err) {
        std::cerr << "ERROR: " << err << std::endl;
    }
    std::cout << "Here the controller is clearly already deleted (any callback after this will crash for sure)" << std::endl;
}

thierryba commented 5 months ago

an example output for the execution of this program is:

MessagingController: 0x16ef0e4c0 ~MessagingController: 0x16ef0e4c0 connectedStatusChangedCB: 0x16ef0e4c0 Here the controller is clearly already deleted (any callback after this will crash for sure) closedCB: 0x16ef0e4c0

levb commented 5 months ago

@thierryba the connection event callbacks are indeed asynchronous, and may be called after natsConnection_Destroy has returned. I question the need to pass the MessagingController's this pointer as the closure context for the connection callbacks. Perhaps, either

(a) separate the necessary context into a different struct (e.g. MessageControllerConnectionData*, and clean it up in closedCB, including deleting it;
(b) use your preferred synchronization mechanism to wait in the destructor until closedCB is done;
or (c) if the cleanup is not needed until the MessageController destructor - place it there, explicitly in the destructor, and don't pass it to closeCB?

Note that if the intention is for MessageController to auto-delete in closedCB e.g. if the server is down, then it should not be allocated on the stack, and can then be deleted from closedCB. A close() method could be exposed to severe the connection (and auto-delete the controller, asynchronously) from the client side.

Please let me know if it helps. I am hesitant to consider controls to "silence them" until I further understand your use case.

thierryba commented 5 months ago

Hello @levb , I suppose option a or b could be implemented. My question is more about: how can this be ok that there is no builtin way to control such things. I think it is a very common use case to embed such c-style functionality inside a c++ class. But there is hardly a way to get that done easily. It is at the very least very error prone.

Having to add classes because the not-documented closedCB call is happening after I call destroy. Then I need to see if it was not already closed before and that opens up a can of worms.

I have an ugly workaround to this. But an elegant solution would be really needed.

levb commented 5 months ago

@thierryba, understood. I could easily add natsOptions_SetNoCallbacksFromDestroy to disable the callbacks from the destructor, and leaving you to add the cleanup code there explicitly.

I'd like to point out that natsConnection is ref-counted, and references may still exist after natsConnection_Destroy returns. You should generally avoid providing pointers to its callbacks that can be prematurely deleted by the client code. I'd need to do some more research to see if implementing something like natsConnection_Destroy(And-Wait-For-All-Callbacks?) would make sense and how exactly it would work.

thierryba commented 5 months ago

@levb thanks you for your reply and I will be waiting for your decision on that one.

levb commented 4 months ago

@thierryba I think that adding natsOptions_SetNoCallbacksFromDestroy could be quite confusing. I am still unclear on why do you need to pass the MessagingController pointer to the connection callbacks. Can MessagingController be reference counted so it is not destroyed until all callbacks have released it?

Adding a synchronous method to wait for all callbacks to have finished, in the NATS client itself, seems counter to its asynchronous design philosophy.

levb commented 1 month ago

Ok, after running into this myself while fixing https://github.com/nats-io/nats.c/pull/771, I think what really needs to happen is natsConnection_Drain should wait for all async errors to be processed, much like it waits for all pending message callbacks to finish. I'll try to fix this one soon.

mcosta74 commented 1 month ago

Yay, it would be great to have it

Il Sab 13 Lug 2024, 07:36 Lev @.***> ha scritto:

Ok, after running into this myself while fixing #771 https://github.com/nats-io/nats.c/pull/771, I think what really needs to happen is natsConnection_Drain should wait for all async errors to be processed, much like it waits for all pending message callbacks to finish. I'll try to fix this one soon.

— Reply to this email directly, view it on GitHub https://github.com/nats-io/nats.c/issues/735#issuecomment-2226782684, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4I7OPEUY4TTKXGHWD2NS3ZMC4E7AVCNFSM6AAAAABE67CSLOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRWG44DENRYGQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

nats-io / nats.c