snarfed / granary

💬 The social web translator
https://granary.io
Creative Commons Zero v1.0 Universal
450 stars 57 forks source link

Photos missing from RSS #674

Closed aaronpk closed 9 months ago

aaronpk commented 9 months ago

Mastodon RSS feed puts photos in a <media:content url=""> tag, which doesn't appear to be recognized by granary.

Example XML:

    <item>
      <guid isPermaLink="true">https://mamot.fr/@nhoizey/111866417349105396</guid>
      <link>https://mamot.fr/@nhoizey/111866417349105396</link>
      <pubDate>Sat, 03 Feb 2024 07:41:05 +0000</pubDate>
      <description>&lt;p&gt;🔗 “Where have all the flowers gone?” by &lt;span class="h-card" translate="no"&gt;&lt;a href="https://mastodon.social/@davatron5000" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank"&gt;@&lt;span&gt;davatron5000&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="https://mamot.fr/tags/IndieWeb" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank"&gt;#&lt;span&gt;IndieWeb&lt;/span&gt;&lt;/a&gt; &lt;/p&gt;&lt;p&gt;⚓️ &lt;a href="https://nicolas-hoizey.com/links/2024/01/19/where-have-all-the-flowers-gone/" rel="nofollow noopener noreferrer" translate="no" target="_blank"&gt;&lt;span class="invisible"&gt;https://&lt;/span&gt;&lt;span class="ellipsis"&gt;nicolas-hoizey.com/links/2024/&lt;/span&gt;&lt;span class="invisible"&gt;01/19/where-have-all-the-flowers-gone/&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;</description>
      <media:content url="https://files.mastodon.social/cache/media_attachments/files/111/866/417/384/185/248/original/875edd3105e4e08b.png" type="image/png" fileSize="97310" medium="image">
        <media:rating scheme="urn:simple">nonadult</media:rating>
        <media:description type="plain">Screenshot of Where have all the flowers gone?</media:description>
      </media:content>
      <category>indieweb</category>
    </item>

JSONFeed conversion from Granary:

{
"author": {
"name": "#indieweb",
"url": "https://mastodon.social/tags/indieweb"
},
"content_html": "Screenshot of Where have all the flowers gone?",
"date_published": "2024-02-03T07:41:05+00:00",
"id": "https://mamot.fr/@nhoizey/111866417349105396",
"url": "https://mamot.fr/@nhoizey/111866417349105396"
}
snarfed commented 9 months ago

Interesting, thanks for filing! I'm not familiar with whatever XML namespace media: is from, but I'll take a look.

snarfed commented 9 months ago

Mastodon's root RSS element w/namespaces is:

<rss version="2.0" xmlns:webfeeds="http://webfeeds.org/rss/1.0" xmlns:media="http://search.yahoo.com/mrss/">
snarfed commented 9 months ago

Also granary is overriding the actual content in <description> with the image's alt text in <media:description type="plain"> 😕.

snarfed commented 9 months ago

Oh we don't parse any images out of RSS at all yet. 🤷

snarfed commented 9 months ago

Done! New example JSON Feed output from https://mstdn.social/@ElleGray.rss :

    {
      "content_html": "<p>look we don't need to be competitive about our hummingbirds I'm just saying mine can joust</p>",
      "date_published": "2024-02-04T18:12:26+00:00",
      "id": "https://mstdn.social/@ElleGray/111874562251028805",
      "image": "https://media.mstdn.social/media_attachments/files/111/874/561/315/706/387/original/4b5673cfb159a20f.jpg",
      "url": "https://mstdn.social/@ElleGray/111874562251028805"
    }