matrix-org / matrix-hookshot

A bridge between Matrix and multiple project management services, such as GitHub, GitLab and JIRA.
https://matrix-org.github.io/matrix-hookshot/
Apache License 2.0
283 stars 68 forks source link

The RSS bot started skipping entries again #844

Open bkil opened 10 months ago

bkil commented 10 months ago

I noticed it only accidentally after I know that something should have appeared as a message. Could you perhaps hook up some kind of automated monitoring of this?

As a continuation of #778

bkil commented 10 months ago

I noticed it for entries on the following feed about 13 hours ago if you would like to investigate a specific case:

bkil commented 10 months ago

The bot also failed to post new entries from here:

https://gitlab.com/bkil/secuchart/-/commits/master.atom

The matrix room was recently upgraded if this is of any relevance. I made sure to unfollow the feed and remove the integration before upgrading the old room and then readd the integration and feed to the new room:

https://matrix.to/#/#secuchart:matrix.org

bkil commented 8 months ago

Still happens as of the past few days, including today. We have temporarily hooked up another RSS bot in the room to compare against so it is more visible.

Half-Shot commented 8 months ago

Hey, we do have some sample feeds we monitor for issues, but of course it doesn't provide an overall picture. Do you have feed urls for the ones that are failing? Even if we miss an item, we would retry it on next fetch unless we have already marked it as handled.

Half-Shot commented 8 months ago

Ah you posted above https://bkil-bot.github.io/osm-rss-juggler/changeset-discuss.xml, got it.

bkil commented 8 months ago

Yes, that sometimes skips. Today it was from this one: https://bkil-bot.github.io/osm-rss-juggler/osm-notes.xml

And specifically these two entries got lost 6 hours ago: https://github.com/bkil-bot/osm-rss-juggler/commit/c341e298d3d5d2b1f25ae0df969f514714c63ac7#diff-7b943466d8c0bf9f148be2a11b68c00cb75f48091a82cd67636ff05ad3f0cd06R6

bkil commented 8 months ago

Could it happen that it tries to submit the feed comment to the given matrix room but depending on load, the matrix HS API NACK's it, while Hookshot marks the notification as delivered, clearing it from its queue?

Half-Shot commented 7 months ago

This is my current theory yes. We clear it from the queue at the point where it is parsed from the feed, but we don't retry if it fails to be delivered. Essentially a victim of the lack of #725

bkil commented 7 months ago

A few other ones were missing from the same feed a few days ago. One that we also caught some days ago was lack of a new commit notification from a low activity gitlab.com repo. We also missed github.com releases in the past.

bkil commented 7 months ago

Would it be possible to log such failures so the suspected root cause could be confirmed to be the same, and not caused by another issue?

Half-Shot commented 7 months ago

Ah yes:


2024-02-04 12:33:42.182 
WARN 12:33:42:181 [Bridge] Connection FeedConnection https://bkil-bot.github.io/osm-rss-juggler/osm-notes.xml failed to handle feed.entry: Error: ESOCKETTIMEDOUT

2024-02-04 11:28:20.753 
WARN 11:28:20:753 [Bridge] Connection FeedConnection https://bkil-bot.github.io/osm-rss-juggler/osm-notes.xml failed to handle feed.entry: Error: ESOCKETTIMEDOUT

2024-02-03 06:32:37.183 
WARN 06:32:37:183 [Bridge] Connection FeedConnection https://bkil-bot.github.io/osm-rss-juggler/osm-notes.xml failed to handle feed.entry: Error: ESOCKETTIMEDOUT
bkil commented 7 months ago

You know the source of the entries better than I do, but it looks like a different problem. As if it failed to ingest the feed itself.

Half-Shot commented 7 months ago

This is definitely matrix-side. The Connection FeedConnection...failed to handle feed.entry error means we've managed to parse out the feed already but failed to handle a feed.entry event, which is what is emitted from the parser. A.k.a we very much can narrow down the problem to a Bridge <-> HS failure.

We already have metrics on this and for integrations.ems.host it's failing a small % of the time. I expect #891 will improve this.

Half-Shot commented 7 months ago

@bkil Could you let me know if this gets better or worse, I've deployed some experimental patches to the integrations server and would like to know if we're still seeing issues?

EDIT: A glance at the logs suggest the main culprits have been dealt with.

bkil commented 7 months ago

Many of us noticed that the RSS bot started reproducing old feed entries from earlier (possibly ones that it skipped previously) Is this an expected side effect?

Half-Shot commented 7 months ago

@bkil Yep, there was a mistaken cache eviction. That should be fixed, can you let me know if either that happens again OR there is a missed feed?

mahdi1234 commented 7 months ago

@Half-Shot what I can see this feed https://github.com/iSoron/uhabits/releases.atom did not update yet, last post in room is 2.1.3 (Aug 28, '23) and 2.2.0 never appeared.

bkil commented 7 months ago

@mahdi1234 I think you won't see previously skipped entries retroactively. Please watch out for skipped new entries from this point in time forward.

mahdi1234 commented 7 months ago

Well my comment was based on OR there is a missed feed to me it looks like in all rooms I've seen it takes current full feed and re-post. But I might be wrong :)

Half-Shot commented 7 months ago

Yes, for absolute clarity the issue is sadly that hookshot stores a feed as checked, and then fails to send the message to Matrix. It won't recover those events. This should now be rare.

bkil commented 7 months ago

@Half-Shot Could you please look into the logs again for me about a lost entry? It was under this path: https://bkil-bot.github.io/osm-rss-juggler/wiki.xml

And added by this git commit 6 hours ago: https://github.com/bkil-bot/osm-rss-juggler/commit/4cf05c28940b29814606714559f4bec78ae3d426

Because this feed rarely changes (and was empty for some time up to this point), the other RSS bot also just noticed it 2 hours ago.

Half-Shot commented 7 months ago

Hm, it claims to have handled one feed item from that feed, but it might have been hit by https://github.com/matrix-org/matrix-hookshot/issues/806? When we connect a feed for the first time, we store all the guids that exist in cache and then handle any new ones from that point.

Did you subscribe to the feed when it was still entirely empty?

EDIT: From reading the code this is likely tangled up with #806-ish. The first non-empty read from a feed is ignored, so if you have no items in you feed upon subscribe, and then you add one item, then it's still ignored. The subsequent reads should be okay. Since it's a bit suble, I'm tracking as #893.

bkil commented 7 months ago

No, we have been subscribed to this feed for months, and the bot had most recently forwarded a post from it on the 6th of February. https://github.com/bkil-bot/osm-rss-juggler/commit/d151dd87a22c505358929c9e2ef0c7a1a7c9d15f

It's just that due to the dynamic nature of such a wiki feed, it can go blank for days, and then get non-blank for a day or two at a time as long as the edit horizon contains hits to show. So this might be a different variant or special recurring case of #893.

bkil commented 7 months ago

New information. One entry from two feeds each got lost that were published by the following commit 2.5h ago:

https://github.com/bkil-bot/osm-rss-juggler/commit/758e71e72d26e058a21c3540b73572c53190a018

Feed URLs and GUIDs:

https://bkil-bot.github.io/osm-rss-juggler/osm-notes.xml GUID https://www.openstreetmap.org/note/4057931#c8972088

https://bkil-bot.github.io/osm-rss-juggler/mastodon-openstreetmap.xml GUID https://en.osm.town/@openstreetmap/111919104056767098

bkil commented 7 months ago

All five added entries from this feed got lost 22 hours ago:

https://github.com/bkil-bot/osm-rss-juggler/commit/bb3aa92526121b078985aecf72f28e749a97512d#diff-7b943466d8c0bf9f148be2a11b68c00cb75f48091a82cd67636ff05ad3f0cd06R6

Feed URL:

GUIDs:

Could you perhaps have a look?

bkil commented 7 months ago

The following entry was lost 1 hour ago:

https://github.com/bkil-bot/osm-rss-juggler/commit/5f70b45a23a7ab739e6e0c25b10549d4075e20cc#diff-ec44ff024548ccff9e78ec7ebb5a61b78b6eef11b15ad0ad2ba020a46f666163R6

Feed URL:

https://bkil-bot.github.io/osm-rss-juggler/changeset-discuss.xml

GUID:

https://www.openstreetmap.org/changeset/147962763#comment_1

bkil commented 7 months ago

The following entry was lost 1 hour ago:

https://github.com/bkil-bot/osm-rss-juggler/commit/6ae88ae46b26f9715698e2d9060d7aff1a6a8804#diff-7b943466d8c0bf9f148be2a11b68c00cb75f48091a82cd67636ff05ad3f0cd06R6

Feed URL:

https://bkil-bot.github.io/osm-rss-juggler/osm-notes.xml

GUID:

https://www.openstreetmap.org/note/4048599#c9028313

Note that due to its instability, most people I know with knowledge of how to debug such issues had stopped using hookshot in the past months and replaced it with other solutions. Hence it will probably only be me contributing reports from now on.

opk12 commented 7 months ago

Another report. These feeds have a new entry every Sunday morning.

The 4 entries of February 18th were all posted. The 8 entries of February 25th and March 3rd were all skipped. The bot says "Successful fetch" and no error is posted in the room, when the feed is expected to discover an entry.

https://www.weeklyosm.eu/en/feed https://www.weeklyosm.eu/fr/feed https://www.weeklyosm.eu/es/feed https://weeklyosm.eu/it/feed

vranki commented 6 months ago

Another report just to let know this is still an issue.

Tooted a new toot to Mastodon, feed URL https://some.hacklab.fi/@tampere.rss

Feeds says successful fetch but nothing posted to room.

vranki commented 6 months ago

Still working very unreliably. I tooted two toots with #sslnr hashtag. First toot was never relayed to Matrix, the second one was. Feed URL: https://sauna.social/tags/sslnr.rss

Has there been any progress on the issue, or should we migrate to for example t2bot's RSS bot?

jonnyrobbie commented 4 months ago

I can confirm the same issue. The last feed I got from https://xkcd.com/rss.xml is on April 26th, while there has been at least five more posts since then. Not only that, the bot reports several successful checks today. No posts though.