vcabbage / amqp

AMQP 1.0 client library for Go.
https://godoc.org/pack.ag/amqp
MIT License
133 stars 94 forks source link

getting 'session ended by server' on re-connecting a Sender #125

Open ninlil opened 5 years ago

ninlil commented 5 years ago

Similar usage-scenario to #109

Scenario: 1) Connecting, creating Session and Sender 2) Send some messages 3) Do some long work 4) To gracefully handle long idle-periods we call Sender.Close after 4 minutes idle (on sending)

Option 1 - as long as work-time is < 10 min NewSender (with same LinkTargetAddress as only opts) works just fine

Option 2 - whenever work-time is > 10 min We get "session ended by server: *Error(nil)" as error from NewSender

If we hadn't closed the Sender we'd have gotten the following error:

link detached, reason: *Error{Condition: amqp : link : detach-forced, Description: The link 'XXXXXXXXXXX' is force detached by the broker due to errors occurred in publisher(link217). Detach origin: AmqpMessagePublisher.IdleTimerExpired: Idle timeout: 00:10:00. TrackingId:XXXXXXXXX

I have been trying to get my head around all that frames and channels in the code to see if there might be a caching of some sort on the re-conneting link but haven't found one.

Could the "caching" be done on Azure's side? If so, Is there any kind of option/flag to change the "signature" of the NewSender to get Azure to interpret the new sender as a different one, or forget it somehow?

Currently the only option I have left is to actually create a new connection every time i've passed the idle-timeout threshold

best regards, Hans

*Edit -Workaround: Since we control both sender and receiver we're sending a ping message (empty message with ContentType = "ping") from every Sender-link, then just silently ignoring those messages on the Receivers

vcabbage commented 5 years ago

Thanks for writing up a detailed description :+1:

If the broker detaches the link, the lib will process it and store it. Since there's currently no way to proactively notify the user of the lib you'll only get it when you perform a link operation. This is something #109 should help with.

To summarize my understanding and my thoughts (please correct me if I've got this wrong):

@devigned Do you have any suggestions on how best to handle this situation given the current limitations?

devigned commented 5 years ago

We handle this by creating a struct which contains the connection, session and link. Upon an error, we rebuild each entity and attempt to complete the send or receive again. For example, this is the recovery function for a sender.

This is definitely not the most efficient way of handling session and link level errors, but it is effective.

@vcabbage what do you think about exposing Err() on connections, sessions and links? It might make it a bit easier for consumers to determine if the entity needs to be recovered recreated.

edit: when I say recovered, I don't mean it in the AMQP sense of recovering. I mean to recreate and reestablish a connection, link or session without regard to previous state.

vcabbage commented 5 years ago

@devigned How do envision consumers interacting with the Err() API? I'm assuming it would be a non-blocking method that either returns an error or nil if the link is still connected?

How would this compare in usability to a callback API to notify on errors? Does this overlap with the functionality proposed in #109 or do you think it would be good to have both?

devigned commented 5 years ago

Yep, I was thinking a non-block method to return error or nil.

I think it's a less ambitious way to deal with errors than #109 that would make errors a bit more transparent. As in #109, having a function to automatically recover an error'd conn, session or link would probably obviate the need for exposing Err(), but I'm a bit concerned about nailing the abstraction level of the recovery func. By exposing Err(), it's less prescriptive and a consumer can decide the level at which they need to recover (conn, session, link). idk, just my 2¢

vcabbage commented 5 years ago

I'm not opposed to adding the API, but I'd like to understand how it would be used. Would the consumer poll it?

devigned commented 5 years ago

I think I'd do something like the following. That way I can be a bit more selective about what I recover.

err = sender.Send(ctx, msg)
if err == nil {
    return nil // success
}

if conn.Err() != nil {
    // recover conn
}

if sender.Err() != nil {
    // recover sender
}

// try sending again
vcabbage commented 5 years ago

Makes sense. What do you think about adding some functions that would allow you to determine where the error originated?

Example:

switch {
case amqp.IsConnError(err):
    // recover conn
case amqp.IsSessionError(err):
    // recover session
case amqp.IsLinkError(err):
    // recover link
}

This isn't necessarily better than your suggestion, it's just an alternative that came to mind. The one nice property of this approach would be that you don't necessarily have to check conn, session, link in order.

A rather tricky caveat to all of this is that there's no guarantee that a lower level error won't occur after receiving a link error. To be robust the recovery logic will have to handle that case.