Closed Pomax closed 3 years ago
I also tried the following, but that doesn't seem to work:
const Parser = require("rss-parser");
const parser = new Parser({
customFields: {
item: ["updated", "published"],
entry: ["updated", "published"],
},
});
Whether I use item
or entry
, the resulting parsed object does not contain an .updated
or .published
property to work with.
hmmm... so it's been broken the last few days, I tried again this morning, and even though the entry XML doesn't look any different, things work again.
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"
xmlns:media="http://search.yahoo.com/mrss/">
<category term="redpandas" label="r/redpandas"/>
<updated>2021-04-03T18:18:30+00:00</updated>
<icon>https://www.redditstatic.com/icon.png/</icon>
<id>/r/redpandas/new.rss?limit=1</id>
<link rel="self" href="https://www.reddit.com/r/redpandas/new.rss?limit=1" type="application/atom+xml" />
<link rel="alternate" href="https://www.reddit.com/r/redpandas/new?limit=1" type="text/html" />
<logo>https://a.thumbs.redditmedia.com/SU2rJah4uwVYZrBB.png</logo>
<subtitle>The place for all things red panda!</subtitle>
<title>newest submissions : redpandas</title>
<entry>
<author>
<name>/u/li_the_great</name>
<uri>https://www.reddit.com/user/li_the_great</uri>
</author>
<category term="redpandas" label="r/redpandas"/>
<content type="html"><table> <tr><td> <a href="https://www.reddit.com/r/redpandas/comments/mjag8c/shalei_was_ready_for_her_glamour_shots_roger/"> <img src="https://external-preview.redd.it/oQ-V3Hv71lKdiVS3m0Jvnr36R1ZkM6WkeJhzQL6lTRc.jpg?width=640&amp;crop=smart&amp;auto=webp&amp;s=67b5d2cf8a74165ffab2186585c958af5dd70817" alt="Sha-lei was ready for her glamour shots - Roger Williams Park Zoo, Rhode Island" title="Sha-lei was ready for her glamour shots - Roger Williams Park Zoo, Rhode Island" /> </a> </td><td> &#32; submitted by &#32; <a href="https://www.reddit.com/user/li_the_great"> /u/li_the_great </a> <br/> <span><a href="https://imgur.com/Zhp1TLA">[link]</a></span> &#32; <span><a href="https://www.reddit.com/r/redpandas/comments/mjag8c/shalei_was_ready_for_her_glamour_shots_roger/">[comments]</a></span> </td></tr></table></content>
<id>t3_mjag8c</id>
<media:thumbnail url="https://external-preview.redd.it/oQ-V3Hv71lKdiVS3m0Jvnr36R1ZkM6WkeJhzQL6lTRc.jpg?width=640&crop=smart&auto=webp&s=67b5d2cf8a74165ffab2186585c958af5dd70817" />
<link href="https://www.reddit.com/r/redpandas/comments/mjag8c/shalei_was_ready_for_her_glamour_shots_roger/" />
<updated>2021-04-03T15:15:30+00:00</updated>
<published>2021-04-03T15:15:30+00:00</published>
<title>Sha-lei was ready for her glamour shots - Roger Williams Park Zoo, Rhode Island</title>
</entry>
</feed>
The only difference I see is the new media:thumbnail
element but that has nothing to do with figuring out the pubDate/isoDate, so... I have no idea what happened, I'll mark this as invalid and refile it it happens again.
I have a simple reddit image board catchup program that grabs the reddit RSS for a subreddit (much like the README shows), and then downloads all images posted "since some date" by looking at the
<pubDate>
of each entry. However, Reddit changed its RSS format and no longer reports datetimes using<pubDate>
and<isoDate>
, instead using<published>
and<updated>
, and those fields are not automatically parsed.It might be worth updating the README, or even making rss-parser parse all nodes by default (in a v4, to avoid breaking codebases that rely on v3's default behaviour) and "nothing except what's in your list of fields to parse" if the user needs more control and manually specifies the fields they need.
For example, https://www.reddit.com/r/redpandas/new.rss?limit=1 yields the following RSS:
So rss-parser won't see any datetimes associated with entries.