RSS feed is missing description/article content for posts

nschonni commented 1 year ago

URL:

https://nodejs.org/en/feed/blog.xml

Browser Name:

Firefox

Browser Version:

*

Operating System:

Windows 11

How to reproduce the issue:

The previous plugin would include the article. The current feed https://nodejs.org/en/feed/blog.xml just includes the title/link but no other content. Similar to https://github.com/nodejs/node/releases.atom, it should contain the full post content

ovflowd commented 1 year ago

The problem is that including the full content of the article is quite problematic. First, because the original plugin wouldn't parse the markdown (if I recall correctly?)

I think we can update the current plugin to also add the parsed content of each blog post, I think.

ovflowd commented 1 year ago

Anyhow, thanks for reporting, @nschonni :)

ovflowd commented 1 year ago

@azu if you feel into tackling this PR, the place where we have the logic is done here https://github.com/nodejs/nodejs.org/blob/main/scripts/next-data/generatePreBuildFiles.mjs#L46

You can debug console.log(post) to see what comes out of it (pretty sure the body should be there).

And the documentation to the feed package is here: https://github.com/jpmonette/feed, you might simply need to add a content field on each item :)

ovflowd commented 1 year ago

Update: I've suggested Nextra implement buil-tin RSS support. Let's see what they say.

kkkrist commented 1 year ago

Some time during the last ca. 24h the whole blog feed (and the other two advertised feeds) went missing (404).

Could you please take a look?

ovflowd commented 1 year ago

Hey @kkkrist indeed that sounds like a bug. Yesterday we changed the approach of generating the RSS feeds during pre-build towards Next.js's App Router Dynamic Routes (https://github.com/nodejs/nodejs.org/blob/main/app/en/feed/%5Bfeed%5D/route.ts)

I think the solution here would be to find a way to tell Next.js to build these on static exports.

ovflowd commented 1 year ago

cc https://github.com/nodejs/nodejs.org/pull/5458 fixes #5456 and @kkkrist comment.

Harkunwar commented 1 year ago

@ovflowd If we want to restore the original functionality and include the post content as well, we would need to be mindful of the RSS feed size. Currently our RSS feed contains all the previous posts, however most RSS feeds that do include content, limit their feed to some number, usually around 10.

ovflowd commented 1 year ago

@ovflowd If we want to restore the original functionality and include the post content as well, we would need to be mindful of the RSS feed size. Currently our RSS feed contains all the previous posts, however most RSS feeds that do include content, limit their feed to some number, usually around 10.

Well, I guess we can definitely open another issue to investigate what our RSS feed should have or not, how many blog posts it should have, etc.

For the content, it would blow up the RSS feed if we added the content for every single post.

We would need to revise these aspects, as we would need to change the Serverless Function that generates feeds if we want to include their contents.

We would also need to be able to extract X paragraphs and append a "Read more" button. All of that sounds a little bit overkill.

I'm strongly leaning towards not implementing this as I'm not sure how worth it is, and so far no one has been complaining about missing "description" for the RSS feeds.

What do y'all think @nodejs/website

nschonni commented 1 year ago

The built-in release feed contains the information except the download links that were only available on the website feed. If @nodejs/releasers wants to modify the process for the release posts to include the missing information it would add back part of the missing functionality.

I don't think ignoring breaking core parts of the open web like RSS is a great approach though.

nschonni commented 1 year ago

I'm strongly leaning towards not implementing this as I'm not sure how worth it is, and so far no one has been complaining about missing "description" for the RSS feeds.

How would anyone else complain? There is an open ticket, and others piled on when the route changes broke the feed entirely. Normally people look for open tickets and don't open duplicates

ovflowd commented 1 year ago

The built-in release feed contains the information except the download links that were only available on the website feed.

What built-in release feed? What sort of information was only available on the website feed? Could you please point out what is being done differently? Because this is definitely before my time here and I might be missing context.

I don't think ignoring breaking core parts of the open web like RSS is a great approach though.

Sorry, what are we breaking here? Could you give examples? The description is not a required field of the RSS spec (https://validator.w3.org/feed/docs/rss2.html#hrelementsOfLtitemgt) either a title or description should be provided.

The release blog posts are too long to include as a description, for example. And what I'm debating here is how valuable it is to add a description here, given the technicalities we would have to face.

ovflowd commented 1 year ago

How would anyone else complain? There is an open ticket, and others piled on when the route changes broke the feed entirely. Normally people look for open tickets and don't open duplicates

People give +1 if they agree that something is missing, even tho if they don't open more issues. I haven't seen any interest in the addition of description besides yours.

I might be wrong, but again, since our blog posts were never designed in a way starting with a short description (introduction) of sorts, the description field even if containing one or two paragraphs would still not bring that much value, in my opinion. I'm more than happy to debate over this.

I haven't even seen interest on the @nodejs/releasers about this blog feed. Again, I'm debating here how much value it is to add a description field, knowing that they can simply click on the link to the blog posts on their RSS readers.

ovflowd commented 1 year ago

I don't think ignoring breaking core parts of the open web like RSS is a great approach though.

I would also, if possible, ask you to avoid adding confrontational comments that are borderline passive-aggressive. I'm not your enemy, I'm trying to understand your case here.

nschonni commented 1 year ago

What built-in release feed? What sort of information was only available on the website feed? Could you please point what is being done differently?

The link from the description https://github.com/nodejs/node/releases.atom

nschonni commented 1 year ago

The release blog posts are too long to include as a description, for example. And what I'm debating here is how valuable it is to add a description here, given the technicalities we would have to face.

The feeds previously did include everything (most of the time), and that made the posts readable and actionable (Eg: downloading a release from the links directly from an RSS client).

ovflowd commented 1 year ago

The link from the description nodejs/node/releases.atom

I have genuinely never seen this file before. Could I get some context on what this releases.atom file is used for, and who even uses them? Was this being used before here?

I think we could definitely source that releases.atom file and incorporate it here. But that ATOM feed seems to be independent of the website.

ovflowd commented 1 year ago

The feeds previously did include everything (most of the time), and that made the posts readable and actionable (Eg: downloading a release from the links directly from an RSS client).

From what I saw in the source code, what you're referring to was never done. Could you please point me to where we sourced the releases.atom file? I'm looking all over the old build of the Metalsmith Website, and I see no reference.

nschonni commented 1 year ago

The GitHub feed was never sourced. It's an automatic feed created/updated when releases are created on the nodejs/node repo. What I meant was that previously, even with the truncation, most release posts were fully published since their size was usually inside the character count. People following the feeds could read the notes and get the download link without clicking through to the page/website

ovflowd commented 1 year ago

The GitHub feed was never sourced. It's an automatic feed created/updated when releases are created on the nodejs/node repo. What I meant was that previously, even with the truncation, most release posts were fully published since their size was usually inside the character count. People following the feeds could read the notes and get the download link without clicking through to the page/website

I see, then I misunderstood you.

Going back to my original point, I would still like to debate the value of adding the description field. I'd argue that we can add it again because this was done on the old Website. It got omitted from the Next.js transition precisely because of some technical challenges (that back then) we would have had by adding this.

The Website RSS feeds containing entire blog posts are way bigger than your average RSS feed.

Most RSS feeds, I know, don't put the entire page on the description field. The RSS spec itself even states that the description is meant to be a synopsis. I argue that the old behaviour was in relaxed accordance with the RSS spec.

I am still leaning against implementing this, but that doesn't mean we can't. But we should consider stripping down the number of entries on the generated feed to 10.

It's also important to mention we would need to parse the Markdown into HTML and append it as CDATA, if we're going the easy route. But I'm against attaching the whole blog page on a description field.

RSS readers can embed the actual blog post within the RSS entry. For example, we could add support to AMP over here to generate minimalistic versions of the feeds.

I'm open to suggestions.

richardlau commented 1 year ago

I haven't even seen interest on the @nodejs/releasers about this blog feed. Again, I'm debating here how much value it is to add a description field, knowing that they can simply click on the link to the blog posts on their RSS readers.

I suspect most @nodejs/releasers are not even aware the nodejs.org blog feed existed. I knew of it, but until I looked just now had no idea how it was generated -- for reference, it used to be done via metalsmith-feed, https://github.com/nodejs/nodejs.org/blob/f34b913a1d084d6ad52904b994978acfa29502d5/build.js#L167-L189.

But we should consider stripping down the number of entries on the generated feed to 10.

The old feed had 20 entries, e.g. https://web.archive.org/web/20230103023604/https://nodejs.org/en/feed/releases.xml from the beginning of the year.

nschonni commented 1 year ago

You're probably on to something with the sourcing/delegating of the GitHub release feed. It would just need buy-in to harmonize the missing content from the template https://github.com/nodejs/nodejs.org/blob/main/scripts/release-post/template.hbs (download links, sha, etc...)

RSS readers can embed the actual blog post within the RSS entry. For example, we could add support to AMP over here to generate minimalistic versions of the feeds.

I'm not sure i'd bet on AMP. Since Google removed the search incentive, I believe larger platforms have been dropping support, so I don't think it's future looks healthy

ovflowd commented 1 year ago

The old feed had 20 entries, e.g. web.archive.org/web/20230103023604/https://nodejs.org/en/feed/releases.xml from the beginning of the year.

That is fair. Apparently, the metalsmith-feed plugin has a default limit of 20 entries. As stated in their readme. I believe we could also adopt this limit.

ovflowd commented 1 year ago

I'm not sure i'd bet on AMP. Since Google removed the search incentive, I believe larger platforms have been dropping support, so I don't think it's future looks healthy

AMP is probably dead. I'm trying to find alternatives that wouldn't require us to build markdown again. But if we limit the results to 20 entries, I feel more compelled to do this. Yet, I still need to figure out how to embed full blog posts. I'm also not sure about generating tailored description fields for the releases blog posts 🤔

ovflowd commented 1 year ago

Anyhow, let's sulk this in for a while, and get some more opinions from @nodejs/releasers and @nodejs/website folks and see if we find an agreement, something that we feel like it's feasible and meets popular demand.

I just wanted to clarify that I'm fine by adding a description field, but I'm arguing over what should be there and how complex that could be. 😅

nschonni commented 1 year ago

Yup, you've raised some points that may simplify things:

Delegate the Release feeds/pages to the nodejs/node releases. That would skip @nodejs/releasers needing to do the PR to this repo, but would required the few extra bits that are currently added in the template.
I see from a quick scan, a few instruction posts mention a post.exerpt, so maybe there is a built-in piece that might handle the old truncation. That might be a good solution for the non-release posts

nschonni commented 1 year ago

I'm now remembering that the GitHub release page does have a character limit that was hit in the past. Not sure if that is still an issue or something that GitHub addressed

ovflowd commented 1 year ago

I see from a quick scan, a few instruction posts mention a post.exerpt, so maybe there is a built-in piece that might handle the old truncation. That might be a good solution for the non-release posts

Well, the excerpt on this tutorial is a frontmatter field, not an actual "feature".

Our RSS feeds are generated by a Serverless-Function (that also is statically generated during build-time for all the RSS feeds we support).

At that point, I'd recall, we read the generated blog-post-data.json contents. This JSON file was introduced to allow Dynamically Generated pages to peek into Blog Data at any point without requiring the file system to be accessed.

So I would argue that adding the contents to this JSON isn't a good approach, but, that doesn't mean our serverless function couldn't peek at the source Markdown file (all source pages are uploaded and part of the Vercel deployment and also available statically on our DigitalOcean server).

So now the complication goes to -> read each Markdown file (through either node:fs) and then again compile the Markdown to HTML through remark.

This seems quite an exhaustive process to go through each file, even tho this is a one-time process... Computational-wise, it shouldn't be bad to access only the last 20 blog post entries for the aforementioned categories.

But it's still something that requires FS access for each one of these files. Hence my concerns here 😅

velu-murugesan commented 2 weeks ago

I am new to open source. Can you please give me a roadmap for open source

AugustinMauroy commented 2 weeks ago

Here where RSS feed is generate :

https://github.com/nodejs/nodejs.org/blob/main/apps/site/app/%5Blocale%5D/feed/%5Bfeed%5D/route.ts

nodejs / nodejs.org