maubot / rss

A RSS plugin for maubot
GNU Affero General Public License v3.0
69 stars 21 forks source link

Stabilize entry IDs #35

Closed AndrewKvalheim closed 1 year ago

AndrewKvalheim commented 2 years ago

Thanks for making this! I tried using it to follow to-rss.xyz/wikipedia/current_events but it sent many duplicate messages. The entries in that feed have no ID—

<item>
  <title>Current events: 2022-07-13</title>
  <link>https://en.wikipedia.org/wiki/Portal:Current_events/2022_July_13</link>
  <description>[VARIABLE CONTENT]</description>
  <pubDate>Wed, 13 Jul 2022 00:00:00 -0000</pubDate>
  </item>
<item>

—and have frequently updated content. Since the fallback ID is derived from the content, each update is erroneously considered a distinct entry. To handle feeds like this, aggregators typically key entries by just the link, not the content.

AndrewKvalheim commented 2 years ago

I see that feedparser already normalizes <guid> so I’ve removed that commit.