Missed events on low-traffic topics

sthussey commented 4 years ago

Thanks for this great library for those of us that don't live in Java land. I'm using this on a low-traffic Salesforce Platform Event queue, so likely hit a lot of long-polling transport timeouts, and we are seeing issues of infrequently missed messages (maybe 1 per week). We don't have any log messages indicating an issue, just other monitoring that alerts us that we have missed an event. Looking through the code, our current theory is that the ReplayExtension diverges from the Java implementation. The Java implementation maintains a table of received replayIds for each subscribed topic and updates outgoing subscription messages with this last known replayId. The replay extension here doesn't seem to maintain that internal state, and wonder if in some cases a reconnect happens requiring a re-subscription but only new events are requested. We had a little trouble fully tracing the codepath for network disconnects/timeouts.

kdcllc commented 4 years ago

@sthussey please try using the preview version https://f.feedz.io/kdcllc/kdcllc/packages/CometD.NetCore2/latest/download of the package

ianrandell-sh commented 3 years ago

@sthussey - did the above preview version fix your issue? Or did you

track the problem down to something else or
fix the problem another way?

TIA

sthussey commented 3 years ago

@kdcllc @ianrandell-sh Apologies for missing this update. We forked + vendored in the library and fixed what I think is basically a subset of #17 . We root caused it to a race condition when the HTTP request for a long poll timed out just prior to the CometD timeout for a new CONNECT meta message. If an event arrived on the bus in that gap between the HTTP request timeout and the new CONNECT, it was missed. We made the HTTP request default timeout longer than the expected 120s CometD timeout and have since not had a missed event.

ianrandell-sh commented 3 years ago

thanks @sthussey - update much appreciated

kdcllc / CometD.NetCore

Missed events on low-traffic topics #18