martinrotter / rssguard

Feed reader (and podcast player) which supports RSS/ATOM/JSON and many web-based feed services.
GNU General Public License v3.0
1.64k stars 125 forks source link

[BUG]: RSS Guard not grabbing Attachments (pictures, videos etc.) for Reddit when trying to Get results of Search Queries Conditioned by NSFW Contents #1531

Closed l5oukz89 closed 2 weeks ago

l5oukz89 commented 2 weeks ago

Brief description of the issue

RSS Guard not grabbing Attachments (pictures, videos etc.) for Reddit when trying to Get results of Search Queries Conditioned by NSFW Contents.

("self:false" parameter will enforce only posts with attachments)

I have logged into my account through the RSS Guard internal browser, so RSS Guard has the cookies of my account and my credentials.

Now please bear with me,

URL (#1) : [https://www.reddit.com/search/?q=potatoes nsfw:yes self:false&sort=new&t=all]

Screenshot of URL #1 (internal browser) :

URL#1 - Screenshot from 2024-11-03 12-00-07 (Blurred and Cropped)

URL (#2) : [https://www.reddit.com/search/?q=potatoes nsfw:yes self:false&include_over_18=on&sort=new&t=all]

Screenshot of URL # 2 (internal browser) :

URL#2 - Screenshot from 2024-11-03 12-00-17 (Blurred and Cropped)

both of these URLs into the internal browser give identical results yet the difference is that URL#2 has "include_over_18=on" parameter.

Now when we transform these into RSS by adding ".rss" just before the "?"

URL (RSS1) : [https://www.reddit.com/search/.rss?q=potatoes nsfw:yes self:false&sort=new&t=all]

URL (RSS2) : [https://www.reddit.com/search.rss?q=potatoes nsfw:yes self:false&include_over_18=on&sort=new&t=all]

The Results are different :

URL (RSS1) returns (Internal browser) :

URL#1 - Screenshot from 2024-11-03 12-31-39 (cropped)

URL (RSS2) returns (Internal browser) :

URL #2 - Screenshot from 2024-11-03 12-32-10 (Cropped)

We noticed that URL#2 which does include the "include_over_18=on" parameter retrieves the posts and seems to be working "normally" whereas URL#1 which doesn't include the "include_over_18=on" doesn't seem to be working.

---> So from that we can conclude that adding the "include_over_18=on" parameter is the "game changer" to fetch reddit search queries that are "nsfw".

Now, bear with me, again,

we'll soon get to the point, but all these are definitely necessary infos as background / context to understand the "bug" following section :

So now let's move to the feed to configure these 2 URLs :

URL #1 :

URL #1 - Screenshot from 2024-11-03 12-45-06 (Blurred and Cropped)

Results fetching URL#1 - Screenshot from 2024-11-03 12-46-34 (Blurred and Cropped)

URL #2 :

URL #2 - Screenshot from 2024-11-03 12-45-49 (Blurred and cropped)

Result Fetching URL#2 - Screenshot from 2024-11-03 13-35-45 (Blurred and Cropped)

As you can see,

URL#1 didn't fetch any articles, whereas URL#2 did fetch the articles BUT as you can notice there is nothing in the "attachments" column and neither in the "has enclosures" column.

---> Proving that indeed, "include_over_18=on" is the right syntax to fetch nsfw queries.

---> But the MAIN problem that remains is that it's not fetching the attachments of the articles.

Now as a proof that fetching feeds with attachments works fine, here is an example of a nsfw feed which contains medias in posts (pictures, videos etc.).

Let's use r/nsfw subreddit (as an example purpose) which is filled with lots of posts containing attachments (pictures, videos etc.)

adding the r/nsfw feed into RSS Guard :

Nsfw Subreddit - Screenshot from 2024-11-03 13-50-45 (Blurred and Cropped)

Fetching the results from that feed :

Results Fetching Nsfw Subreddit - Screenshot from 2024-11-03 13-51-34 (Blurred and Cropped)

Selection of Article #1 from that feed :

article # 1 - nsfw subreddit - Screenshot from 2024-11-03 13-52-02 (Blurred and Smudged and Cropped)

---> As you can see, the syntax ".rss" works fine with a "subreddit that has nsfw attributes" here in that case named "r/nsfw" (for example purpose) and it's fetching correctly and it's showing that the articles contains attachments and is displaying them in the article viewer without any problem.

Wrapping it up :

---> I think I exposed my case in a very clear, scientific and organized way and I couldn't do it in any better way... ---> I have given proofs that :

Now the ball is in your court to investigate the technical / development side of it.

I'm just going to give an intuitive theory here so maybe the "attachment container" changed from reddit side or maybe there is a bug on the program side specifically related to reddit nsfw search queries ?

How to reproduce the bug?

it has been exhaustively described in the above section.

What was the expected result?

getting and seeing the attachments of the posts which contains attachments.

What actually happened?

no articles attachments are shown in the attachment column.

Debug log

time=" 13899.210" type="debug" -> CTRL is NOT pressed while sorting articles - sorting with standard mode. time=" 13899.210" type="debug" -> core: Displaying messages from feeds IDs: ''4496'' and URLs: 'https://www.reddit.com/search.rss?q=potatoes nsfw:yes self:false&include_over_18=on&sort=new&t=all'. time=" 13899.212" type="debug" -> message-model: Repopulated model, SQL statement is now: 'SELECT Messages.id, Messages.is_read, Messages.is_important, Messages.is_deleted, Messages.is_pdeleted, Messages.feed, Messages.title, Messages.url, Messages.author, Messages.date_created, Messages.contents, Messages.enclosures, Messages.score, Messages.account_id, Messages.custom_id, Messages.custom_hash, Feeds.title, Feeds.is_rtl, CASE WHEN LENGTH(Messages.enclosures) > 10 THEN 'true' ELSE 'false' END AS has_enclosures, (SELECT GROUP_CONCAT(Labels.name) FROM Labels WHERE Messages.labels LIKE '%.' || Labels.custom_id || '.%') as msg_labels, Messages.labels FROM Messages LEFT JOIN Feeds ON Messages.feed = Feeds.custom_id AND Messages.account_id = Feeds.account_id WHERE Feeds.custom_id IN ('4496') AND Messages.is_deleted = 0 AND Messages.is_pdeleted = 0 AND Messages.account_id = 1 ORDER BY Messages.date_created DESC, LOWER(Messages.author) ASC;'.

Operating system and version

martinrotter commented 2 weeks ago

Hi so I did my reseearch and RSS Guard itself is not to blame here.

You do not see/get image attachments simply because the relevant feed does not provide them.

Here is excerpt from "feed/subreddit .rss"

  <entry>
    <author>
      <name>/u/Princess_Place</name>
      <uri>https://www.reddit.com/user/Princess_Place</uri>
    </author>
    <category term="nsfw" label="r/nsfw" />
    <content type="html">&lt;table&gt; &lt;tr&gt;&lt;td&gt; &lt;a
      href=&quot;https://www.reddit.com/r/nsfw/comments/1gjcgrs/late_creampie/&quot;&gt; &lt;img
      src=&quot;https://external-preview.redd.it/twdz3pSSsLIZcpHh9ULn_rSnQ7LF-KCXJZw_UHKkYxA.jpg?width=640&amp;amp;crop=smart&amp;amp;auto=webp&amp;amp;s=1309fbc0adb43e353d74417e00369ca01a99479c&quot;
      alt=&quot;Late creampie&quot; title=&quot;Late creampie&quot; /&gt; &lt;/a&gt;
      &lt;/td&gt;&lt;td&gt; &amp;#32; submitted by &amp;#32; &lt;a
      href=&quot;https://www.reddit.com/user/Princess_Place&quot;&gt; /u/Princess_Place &lt;/a&gt;
      &lt;br/&gt; &lt;span&gt;&lt;a
      href=&quot;https://www.redgifs.com/watch/loyaltrickycoelacanth&quot;&gt;[link]&lt;/a&gt;&lt;/span&gt;
      &amp;#32; &lt;span&gt;&lt;a
      href=&quot;https://www.reddit.com/r/nsfw/comments/1gjcgrs/late_creampie/&quot;&gt;[comments]&lt;/a&gt;&lt;/span&gt;
      &lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;</content>
    <id>t3_1gjcgrs</id>
    <media:thumbnail
      url="https://external-preview.redd.it/twdz3pSSsLIZcpHh9ULn_rSnQ7LF-KCXJZw_UHKkYxA.jpg?width=640&amp;crop=smart&amp;auto=webp&amp;s=1309fbc0adb43e353d74417e00369ca01a99479c" />
    <link href="https://www.reddit.com/r/nsfw/comments/1gjcgrs/late_creampie/" />
    <updated>2024-11-04T11:56:10+00:00</updated>
    <published>2024-11-04T11:56:10+00:00</published>
    <title>Late creampie</title>
  </entry>

And here is excerpt from "search .rss"

<entry>
    <author>
      <name>/u/SinfulSuccubus420</name>
      <uri>https://www.reddit.com/user/SinfulSuccubus420</uri>
    </author>
    <category term="ChubbyStonerChickz" label="r/ChubbyStonerChickz" />
    <content type="html">
      &amp;#32; submitted by &amp;#32; &lt;a
      href=&quot;https://www.reddit.com/user/SinfulSuccubus420&quot;&gt; /u/SinfulSuccubus420
      &lt;/a&gt; &amp;#32; to &amp;#32; &lt;a
      href=&quot;https://www.reddit.com/r/ChubbyStonerChickz/&quot;&gt; r/ChubbyStonerChickz
      &lt;/a&gt; &lt;br/&gt; &lt;span&gt;&lt;a
      href=&quot;https://i.redd.it/t5swmf0l4vyd1.jpeg&quot;&gt;[link]&lt;/a&gt;&lt;/span&gt;
      &amp;#32; &lt;span&gt;&lt;a
      href=&quot;https://www.reddit.com/r/ChubbyStonerChickz/comments/1gjb85w/i_identify_as_a_potato_because_i_like_to_be_baked/&quot;&gt;[comments]&lt;/a&gt;&lt;/span&gt;</content>
    <id>t3_1gjb85w</id>
    <link
      href="https://www.reddit.com/r/ChubbyStonerChickz/comments/1gjb85w/i_identify_as_a_potato_because_i_like_to_be_baked/" />
    <updated>2024-11-04T10:34:10+00:00</updated>
    <published>2024-11-04T10:34:10+00:00</published>
    <title>I identify as a potato because I like to be baked and smashed😈🥵</title>
  </entry>

Both these entries do have "picture" when viewer directly on Reddit website in web browser, but Reddit (for some reason, don't know why) only provides "attachment" for the entries from RSS generated from subreddits (not from searches).

See, the subreddit RSS sample entry above includes media:thumbnail XML element which is the element which contains hyperlink to the actual attachment. The other entry does not, thus cannot be parsed/displayed accordingly in RSS Guard.

l5oukz89 commented 2 weeks ago

okay, I really thank you for your kindness and taking the time to look into it...

For the record, I would like to notify that this syntax written below specific for nsfw search queries used to work well (also showing the attachments in the attachment column) in the previous versions of RSS Guard (maybe 4.6 ?, maybe since 4 months ago?) and now it stopped completely working and adding it back in RSS gives "Error : feed format not recognized".

Syntax :

https://www.reddit.com/search/.xml?q=title%3A(nsfw)+self%3Afalse+nsfw%3Ayes&type=link&cId=c7562f6a-aac0-4f11-97f5-1950c3cee7a2&iId=5fe9374a-0b68-4fe6-aa07-25c5931eac42&sort=new&t=all.rss

martinrotter commented 2 weeks ago

This URL gives "page not found" error even in Firefox for me.

l5oukz89 commented 2 weeks ago

if i'm not wrong it used to give results similar to that URL RSS2 page above, then Reddit must have changed something recently...

martinrotter commented 2 weeks ago

Like I said, those attachments are simply not there.