rbren / rss-parser

A lightweight RSS parser, for Node and the browser
MIT License
1.38k stars 209 forks source link

<media:keywords> is not processed #260

Open jsit opened 1 year ago

jsit commented 1 year ago

The New Yorker's feed (which it labels as Atom) uses <media:keywords>, with a comma-separated list of tags:

https://www.newyorker.com/feed/rss

For instance:

<item>
  <title>Will Sam Bankman-Fried’s Guilty Verdict Change Anything?</title>
  <link>https://www.newyorker.com/news/our-local-correspondents/the-trials-of-sam-bankman-fried</link>
  <guid isPermaLink="false">6539673cdb4a6fd40731a4c3</guid>
  <pubDate>Fri, 03 Nov 2023 15:54:47 +0000</pubDate>
  <media:content/>
  <description>The former C.E.O. of FTX now faces up to a hundred and ten years in prison. But, beyond resetting his personal fate, it’s not yet clear what the trial accomplished.</description>
  <category>News / Our Local Correspondents</category>
  <media:keywords>Sam Bankman-Fried, Trials, Fraud, Cryptocurrency, Verdict, Crimes</media:keywords>
  <dc:creator>Gideon Lewis-Kraus</dc:creator>
  <dc:publisher>Condé Nast</dc:publisher>
  <media:thumbnail url="https://media.newyorker.com/photos/6544fcde8c6b6b5e03c9f9d8/master/pass/GLK-SBF-Guilty-2.jpg" width="2497" height="2560"/>
</item>

This seems to be part of RSS 2.0?

https://www.rssboard.org/media-rss#media-keywords

In any case, it would be nice if these were processed like <category> elements.

Checklist - [X] Create `src/parser.js` ✓ https://github.com/rbren/rss-parser/commit/53278f621b4c480870c17ec232e2cb543368010b [Edit](https://github.com/rbren/rss-parser/edit/sweep/mediakeywords_is_not_processed/src/parser.js#L1-L999) - [X] Running GitHub Actions for `src/parser.js` ✗ [Edit](https://github.com/rbren/rss-parser/edit/sweep/mediakeywords_is_not_processed/src/parser.js#L1-L999)
sweep-ai[bot] commented 9 months ago

🚀 Here's the PR! #267

See Sweep's progress at the progress dashboard!
Sweep Basic Tier: I'm using GPT-4. You have 5 GPT-4 tickets left for the month and 3 for the day. (tracking ID: f82993a837)

For more GPT-4 tickets, visit our payment portal. For a one week free trial, try Sweep Pro (unlimited GPT-4 tickets).
Install Sweep Configs: Pull Request

[!TIP] I can email you next time I complete a pull request if you set up your email here!


Actions (click)

GitHub Actions failed

The sandbox appears to be unavailable or down.


Step 1: 🔎 Searching

I found the following snippets in your repository. I will now analyze these snippets and come up with a plan.

Some code snippets I think are relevant in decreasing order of relevance (click to expand). If some file is missing from here, you can mention the path in the ticket description. https://github.com/rbren/rss-parser/blob/a8156ee46e0c6bc03bbaa807f7459b1ff4bddafc/test/input/guardian.rss#L1-L192 https://github.com/rbren/rss-parser/blob/a8156ee46e0c6bc03bbaa807f7459b1ff4bddafc/test/input/content-encoded.rss#L1-L60
I also found the following external resources that might be helpful: **Summaries of links found in the content:** https://www.rssboard.org/media-rss#media-keywords: The page is about the Media RSS Specification, specifically version 1.5.1. It is a specification that supplements the element capabilities of RSS 2.0 to allow for more robust media syndication. The specification includes various elements and attributes that can be used to describe media objects, such as , , , and more. The page also provides examples of how to use these elements in an RSS feed. The user is specifically interested in the element, which is used to provide a comma-separated list of tags for a media object. They want to process this element similar to the element in an RSS feed. https://www.newyorker.com/feed/rss: The page provided is the metadata for The New Yorker website. It includes various articles and content from different categories such as Science, Humor, Podcasts, News, Culture, and more. The metadata includes the title, URL, publication date, category, author, and thumbnail image for each article. The specific problem mentioned is related to the use of `` in the RSS feed of The New Yorker, which provides comma-separated tags for each article. The user suggests that these tags should be processed similar to the `` elements in the RSS feed. https://www.newyorker.com/news/our-local-correspondents/the-trials-of-sam-bankman-fried` in the RSS feed.

Step 2: ⌨️ Coding

Ran GitHub Actions for 53278f621b4c480870c17ec232e2cb543368010b:
• build (18.x):
• build (16.x):
• build (14.x):


Step 3: 🔁 Code Review

I have finished reviewing the code for completeness. I did not find errors for sweep/mediakeywords_is_not_processed.


🎉 Latest improvements to Sweep:
  • New dashboard launched for real-time tracking of Sweep issues, covering all stages from search to coding.
  • Integration of OpenAI's latest Assistant API for more efficient and reliable code planning and editing, improving speed by 3x.
  • Use the GitHub issues extension for creating Sweep issues directly from your editor.

💡 To recreate the pull request edit the issue title or description. To tweak the pull request, leave a comment on the pull request.Something wrong? Let us know.

This is an automated message generated by Sweep AI.