miniflux / v2

Minimalist and opinionated feed reader
https://miniflux.app
Apache License 2.0
6.99k stars 727 forks source link

RSS <description></description> content rendered error. #2853

Open jkryanchou opened 2 months ago

jkryanchou commented 2 months ago

I subscribe a twitter feed from RSSHub. while it contains the nested html in tags like below

Orignal RSS Item Content

...
</item>
<item>
<title>宝玉: ↩️ @maximlott This was the prompt it used: Below is a verbal description of a puzzle, consisting of a 3x3 grid, with the lowest-right square b...</title>
<description><img width="0" height="0" hidden="true" src="https://pbs.twimg.com/media/GXjFIS5XUAEDFos?format=jpg&amp;name=orig" referrerpolicy="no-referrer"><a href="https://twitter.com/dotey" target="_blank" rel="noopener noreferrer"><img width="48" height="48" src="https://pbs.twimg.com/profile_images/561086911561736192/6_g58vEs_normal.jpeg" hspace="8" vspace="8" align="left" referrerpolicy="no-referrer"><strong>宝玉</strong></a>: ↩️ @maximlott Alright, this IQ test image was converted into text, so it doesn't reflect the actual results. At most, it can demonstrate that o1 preview's capabilities surpass those of other models, but we can't say that o1's IQ has reached 120. https://t.co/mWFsmEeIRI https://t.co/Iyo6kj5Qqu<br clear="both"><div style="clear: both"></div><a href="https://pbs.twimg.com/media/GXjFIS5XUAEDFos?format=jpg&amp;name=orig" target="_blank" rel="noopener noreferrer"><img height="150" style="height: 150px;" hspace="4" vspace="8" src="https://pbs.twimg.com/media/GXjFIS5XUAEDFos?format=jpg&amp;name=orig" referrerpolicy="no-referrer"></a><br clear="both"><div style="clear: both"></div><hr><small>Mon Sep 16 2024 05:44:45 GMT+0800 (China Standard Time)</small><br><br><img width="0" height="0" hidden="true" src="https://pbs.twimg.com/media/GXkvtzsWIAA9VMM?format=jpg&amp;name=orig" referrerpolicy="no-referrer"><a href="https://x.com/dotey" target="_blank" rel="noopener noreferrer"><img width="48" height="48" src="https://pbs.twimg.com/profile_images/561086911561736192/6_g58vEs_normal.jpeg" hspace="8" vspace="8" align="left" referrerpolicy="no-referrer"><strong>宝玉</strong></a>: ↩️ @maximlott This was the prompt it used:<br><br>Below is a verbal description of a puzzle, consisting of a 3x3 grid, with the lowest-right square being empty. Please consider the patterns and determine the appropriate answer to fill in the empty square. First row, first column: two lines forming a<br clear="both"><div style="clear: both"></div><a href="https://pbs.twimg.com/media/GXkvtzsWIAA9VMM?format=jpg&amp;name=orig" target="_blank" rel="noopener noreferrer"><img height="150" style="height: 150px;" hspace="4" vspace="8" src="https://pbs.twimg.com/media/GXkvtzsWIAA9VMM?format=jpg&amp;name=orig" referrerpolicy="no-referrer"></a><br clear="both"><div style="clear: both"></div><hr><small>Mon Sep 16 2024 13:27:53 GMT+0800 (China Standard Time)</small></description>
<link>https://x.com/dotey/status/1835551155528638868</link>
<guid isPermaLink="false">https://twitter.com/dotey/status/1835551155528638868</guid>
<pubDate>Mon, 16 Sep 2024 05:27:53 GMT</pubDate>
<author>宝玉</author>
</item>
...

While the NetNewsWired could rendered it well, Miniflux did not.

NetNewsWired

image

Miniflux

image

And I configured the scraper rule as div.content according to the section Filter, Rewrite and Scraper Rules I have searched for so long got nothing help for this issue. and I have no idea whether my scraper rule was wrong or anyone could help me to figure it out.

jkryanchou commented 2 months ago

I guessed the code from here

...
entry.Content = sanitizer.Sanitize(pageBaseURL, entry.Content)
...

It santize the original content...

jkryanchou commented 2 months ago

Is there anyone could help me to figure it out what’s wrong with it?