Open bkil opened 10 months ago
I noticed it for entries on the following feed about 13 hours ago if you would like to investigate a specific case:
The bot also failed to post new entries from here:
https://gitlab.com/bkil/secuchart/-/commits/master.atom
The matrix room was recently upgraded if this is of any relevance. I made sure to unfollow the feed and remove the integration before upgrading the old room and then readd the integration and feed to the new room:
Still happens as of the past few days, including today. We have temporarily hooked up another RSS bot in the room to compare against so it is more visible.
Hey, we do have some sample feeds we monitor for issues, but of course it doesn't provide an overall picture. Do you have feed urls for the ones that are failing? Even if we miss an item, we would retry it on next fetch unless we have already marked it as handled.
Ah you posted above https://bkil-bot.github.io/osm-rss-juggler/changeset-discuss.xml, got it.
Yes, that sometimes skips. Today it was from this one: https://bkil-bot.github.io/osm-rss-juggler/osm-notes.xml
And specifically these two entries got lost 6 hours ago: https://github.com/bkil-bot/osm-rss-juggler/commit/c341e298d3d5d2b1f25ae0df969f514714c63ac7#diff-7b943466d8c0bf9f148be2a11b68c00cb75f48091a82cd67636ff05ad3f0cd06R6
Could it happen that it tries to submit the feed comment to the given matrix room but depending on load, the matrix HS API NACK's it, while Hookshot marks the notification as delivered, clearing it from its queue?
This is my current theory yes. We clear it from the queue at the point where it is parsed from the feed, but we don't retry if it fails to be delivered. Essentially a victim of the lack of #725
A few other ones were missing from the same feed a few days ago. One that we also caught some days ago was lack of a new commit notification from a low activity gitlab.com repo. We also missed github.com releases in the past.
Would it be possible to log such failures so the suspected root cause could be confirmed to be the same, and not caused by another issue?
Ah yes:
2024-02-04 12:33:42.182
WARN 12:33:42:181 [Bridge] Connection FeedConnection https://bkil-bot.github.io/osm-rss-juggler/osm-notes.xml failed to handle feed.entry: Error: ESOCKETTIMEDOUT
2024-02-04 11:28:20.753
WARN 11:28:20:753 [Bridge] Connection FeedConnection https://bkil-bot.github.io/osm-rss-juggler/osm-notes.xml failed to handle feed.entry: Error: ESOCKETTIMEDOUT
2024-02-03 06:32:37.183
WARN 06:32:37:183 [Bridge] Connection FeedConnection https://bkil-bot.github.io/osm-rss-juggler/osm-notes.xml failed to handle feed.entry: Error: ESOCKETTIMEDOUT
You know the source of the entries better than I do, but it looks like a different problem. As if it failed to ingest the feed itself.
This is definitely matrix-side. The Connection FeedConnection...failed to handle feed.entry
error means we've managed to parse out the feed already but failed to handle a feed.entry
event, which is what is emitted from the parser. A.k.a we very much can narrow down the problem to a Bridge <-> HS failure.
We already have metrics on this and for integrations.ems.host it's failing a small % of the time. I expect #891 will improve this.
@bkil Could you let me know if this gets better or worse, I've deployed some experimental patches to the integrations server and would like to know if we're still seeing issues?
EDIT: A glance at the logs suggest the main culprits have been dealt with.
Many of us noticed that the RSS bot started reproducing old feed entries from earlier (possibly ones that it skipped previously) Is this an expected side effect?
@bkil Yep, there was a mistaken cache eviction. That should be fixed, can you let me know if either that happens again OR there is a missed feed?
@Half-Shot what I can see this feed https://github.com/iSoron/uhabits/releases.atom did not update yet, last post in room is 2.1.3 (Aug 28, '23) and 2.2.0 never appeared.
@mahdi1234 I think you won't see previously skipped entries retroactively. Please watch out for skipped new entries from this point in time forward.
Well my comment was based on OR there is a missed feed
to me it looks like in all rooms I've seen it takes current full feed and re-post. But I might be wrong :)
Yes, for absolute clarity the issue is sadly that hookshot stores a feed as checked, and then fails to send the message to Matrix. It won't recover those events. This should now be rare.
@Half-Shot Could you please look into the logs again for me about a lost entry? It was under this path: https://bkil-bot.github.io/osm-rss-juggler/wiki.xml
And added by this git commit 6 hours ago: https://github.com/bkil-bot/osm-rss-juggler/commit/4cf05c28940b29814606714559f4bec78ae3d426
Because this feed rarely changes (and was empty for some time up to this point), the other RSS bot also just noticed it 2 hours ago.
Hm, it claims to have handled one feed item from that feed, but it might have been hit by https://github.com/matrix-org/matrix-hookshot/issues/806? When we connect a feed for the first time, we store all the guids that exist in cache and then handle any new ones from that point.
Did you subscribe to the feed when it was still entirely empty?
EDIT: From reading the code this is likely tangled up with #806-ish. The first non-empty read from a feed is ignored, so if you have no items in you feed upon subscribe, and then you add one item, then it's still ignored. The subsequent reads should be okay. Since it's a bit suble, I'm tracking as #893.
No, we have been subscribed to this feed for months, and the bot had most recently forwarded a post from it on the 6th of February. https://github.com/bkil-bot/osm-rss-juggler/commit/d151dd87a22c505358929c9e2ef0c7a1a7c9d15f
It's just that due to the dynamic nature of such a wiki feed, it can go blank for days, and then get non-blank for a day or two at a time as long as the edit horizon contains hits to show. So this might be a different variant or special recurring case of #893.
New information. One entry from two feeds each got lost that were published by the following commit 2.5h ago:
https://github.com/bkil-bot/osm-rss-juggler/commit/758e71e72d26e058a21c3540b73572c53190a018
Feed URLs and GUIDs:
https://bkil-bot.github.io/osm-rss-juggler/osm-notes.xml
GUID https://www.openstreetmap.org/note/4057931#c8972088
https://bkil-bot.github.io/osm-rss-juggler/mastodon-openstreetmap.xml
GUID https://en.osm.town/@openstreetmap/111919104056767098
All five added entries from this feed got lost 22 hours ago:
Feed URL:
GUIDs:
https://www.openstreetmap.org/note/4127077#c9010944
https://www.openstreetmap.org/note/4127075#c9010941
https://www.openstreetmap.org/note/4127074#c9010938
https://www.openstreetmap.org/note/4127069#c9010924
https://www.openstreetmap.org/note/4127068#c9010921
Could you perhaps have a look?
The following entry was lost 1 hour ago:
Feed URL:
https://bkil-bot.github.io/osm-rss-juggler/changeset-discuss.xml
GUID:
https://www.openstreetmap.org/changeset/147962763#comment_1
The following entry was lost 1 hour ago:
Feed URL:
https://bkil-bot.github.io/osm-rss-juggler/osm-notes.xml
GUID:
https://www.openstreetmap.org/note/4048599#c9028313
Note that due to its instability, most people I know with knowledge of how to debug such issues had stopped using hookshot in the past months and replaced it with other solutions. Hence it will probably only be me contributing reports from now on.
Another report. These feeds have a new entry every Sunday morning.
The 4 entries of February 18th were all posted. The 8 entries of February 25th and March 3rd were all skipped. The bot says "Successful fetch" and no error is posted in the room, when the feed is expected to discover an entry.
https://www.weeklyosm.eu/en/feed https://www.weeklyosm.eu/fr/feed https://www.weeklyosm.eu/es/feed https://weeklyosm.eu/it/feed
Another report just to let know this is still an issue.
Tooted a new toot to Mastodon, feed URL https://some.hacklab.fi/@tampere.rss
Feeds says successful fetch but nothing posted to room.
Still working very unreliably. I tooted two toots with #sslnr hashtag. First toot was never relayed to Matrix, the second one was. Feed URL: https://sauna.social/tags/sslnr.rss
Has there been any progress on the issue, or should we migrate to for example t2bot's RSS bot?
I can confirm the same issue. The last feed I got from https://xkcd.com/rss.xml is on April 26th, while there has been at least five more posts since then. Not only that, the bot reports several successful checks today. No posts though.
I noticed it only accidentally after I know that something should have appeared as a message. Could you perhaps hook up some kind of automated monitoring of this?
As a continuation of #778