nats-io / nats.c

A C client for NATS
Apache License 2.0
382 stars 132 forks source link

natsConnection_SubscribeTimeout callback called twice having auto-unsubscribe after 1 message #755

Closed EgusCpp closed 3 months ago

EgusCpp commented 3 months ago

Observed behavior

Sending application calls the following functions (errors hadling is omitted here for simplicity):

natsSubscription* newSub;
natsConnection_SubscribeTimeout(&newSub, conn, reply_subj,  1000, reply_callback, closure_ptr);
natsSubscription_AutoUnsubscribe(newSub, 1);
natsConnection_PublishRequest(conn, subj, reply_subj, data, data_len);

Receiving application waiting for subjects via subscription:

natsConnection_Subscribe(&receiveSub, conn, subj, receiveCallback, closure_ptr);

Usually it's working good. While at certain moment I got the following:
1) reply_callback has been called a long before timeout expires, having NULL msg and natsSubscription_GetDelivered returned 0 number. I.e. it is expected to be not delivered due to some reason. 2) reply_callback has been called again with good reply from recepient (i.e. message has been delivered).

Expected behavior

When we're having a callback here, it is expected to have univocal information related to delivery status. I.e. if we have a callback with status "not delivered" first, then it is expected that message won't be actually delivered.

If there could be some kind of "intermediate state" callback, could you please point me to the criteria how to distinguish it from "final error" callback?

Server and client version

nats-server: v2.10.7
c library version: 3.7.1-beta (commit fab89f4 )

Host environment

Both sender and recepient are running on CentOS Stream release 8
Linux version 4.18.0-483.el8.x86_64 (mockbuild@x86-05.stream.rdu2.redhat.com) (gcc version 8.5.0 20210514 (Red Hat 8.5.0-18) (GCC)) #1 SMP Fri Mar 31 13:24:48 UTC 2023

Steps to reproduce

No response

kozlovic commented 3 months ago

@EgusCpp Getting a NULL message indicating a timeout followed by a message is not a sign of an issue. If the subscription really timed-out, that is, there was no message received by the time of the timeout, it is expected to get the NULL message. Unless you destroy the subscription in the callback when that happens, then if a message becomes available, you will receive that message. See from the doc (http://nats-io.github.io/nats.c/group__conn_sub_group.html#gacb2c83b58b7909715424cbc327fdd404):

If no message is received by the given timeout (in milliseconds), the message handler is invoked with a NULL message.
You can then destroy the subscription in the callback, or simply return, in which case, the message handler will fire again when a message is received or the subscription times-out again.

Note that the fact that you use "auto unsubscribe of 1" does not mean that the callback will not be invoked more than once when using natsConnection_SubscribeTimeout() if the first message(s) received is (are) NULL message(s). The auto-unsubscribe applies to actual delivered messages.

If you claim that the callback is sometimes invoked "long" before the timeout, not sure what "long" means in this case since your timeout is 1 second, then there may be an issue with the timers. Could you instrument your test app a bit to capture/print the time before creating the subscription and print the time in the callback when getting a message (NULL or otherwise). This would give us an indication of how long it took for the callback to fire and see if it is close to 1000 milliseconds or not.

EgusCpp commented 3 months ago

@kozlovic Thank you for reply, something became more clear now. I will try to reproduce this to get more details.
Also, you mentioned I still need to destroy subscription on timeout. If I do this, could it guarantee message won't be delivered?

kozlovic commented 3 months ago

@EgusCpp You don't have to destroy the subscription when getting the NULL message, it depends on your needs. But if you do, then it is guaranteed that this callback will not be invoked after that. Even if there was a message that just came in and is ready to be dispatched, the fact that you would call natsSubscription_Destroy() in the callback would prevent the callback to be invoked again after it returns (the message would simply be internally discarded).

Now that you understand a bit better how natsConnection_SubscribeTimeout() works, run more tests and let us know if it works as expected. Only then we will close this issue.

EgusCpp commented 3 months ago

Only callback won't be invoked, or initial message (sent by natsConnection_PublishRequest) should be discarded as well?

kozlovic commented 3 months ago

initial message (sent by natsConnection_PublishRequest) should be discarded as well?

No, the fact that you destroy the subscription has nothing to do with the fact that you publish a message.

Again, what I was trying to explain is that destroying a subscription from a callback guarantees that the callback will not be invoked again (even if there were pending messages ready to be delivered).

kozlovic commented 3 months ago

And again, I am not suggesting that you do destroy from the callback, it depends what you want. If you understand that you can get the callback triggered with a NULL message (possibly multiple times) before getting the expected message, then this is fine, you don't need to destroy the subscription in the callback. Your misunderstanding was that the callback should have been invoked only once, which is not the case.

EgusCpp commented 3 months ago

Ok, thank you! I tested it again, and looks timeouts are coming as expected. So, most likely, there is no issue with them.

And most likely, I shouldn't use timeouts for subscription here at all, as I need timeout on sending rather on receiving.