Closed sthussey closed 3 years ago
@sthussey please try using the preview version https://f.feedz.io/kdcllc/kdcllc/packages/CometD.NetCore2/latest/download of the package
@sthussey - did the above preview version fix your issue? Or did you
TIA
@kdcllc @ianrandell-sh Apologies for missing this update. We forked + vendored in the library and fixed what I think is basically a subset of #17 . We root caused it to a race condition when the HTTP request for a long poll timed out just prior to the CometD timeout for a new CONNECT meta message. If an event arrived on the bus in that gap between the HTTP request timeout and the new CONNECT, it was missed. We made the HTTP request default timeout longer than the expected 120s CometD timeout and have since not had a missed event.
thanks @sthussey - update much appreciated
Thanks for this great library for those of us that don't live in Java land. I'm using this on a low-traffic Salesforce Platform Event queue, so likely hit a lot of long-polling transport timeouts, and we are seeing issues of infrequently missed messages (maybe 1 per week). We don't have any log messages indicating an issue, just other monitoring that alerts us that we have missed an event. Looking through the code, our current theory is that the ReplayExtension diverges from the Java implementation. The Java implementation maintains a table of received replayIds for each subscribed topic and updates outgoing subscription messages with this last known replayId. The replay extension here doesn't seem to maintain that internal state, and wonder if in some cases a reconnect happens requiring a re-subscription but only new events are requested. We had a little trouble fully tracing the codepath for network disconnects/timeouts.