Closed tmattio closed 3 years ago
@patricoferris the scrapers are extremely minimal, it was just faster than copying a bunch of articles manually. What do you think of the approach in general?
I added some basic scraping on the blog posts from the RSS feed to extract the preview image and description. The preview looks like this:
This can certainly be improved, for instance, we don't look into the HTML body for an image if we didn't find one in head
, and we don't get the authors' names. But I guess this is enough for now.
Some other things improve on this PR:
But I'd suggest we do this in a follow-up PR
The manual scraping here is great, but we should also clearly demarcate the "original content" on ocaml.org that originates from here too. The primary sources for that content so far are:
Thanks for the feedback @avsm!
I need to merge this for the migration of v3, so I'll create an issue for the import of the original content. I'll open a follow-up PR for this 🙂
Some initial work to add news articles/blog posts.
As discussed with @patricoferris, we don't want to port the old planet syndication because of the maintenance cost associated with it. Instead, we manually select the articles to use from a given source (only Tarides, Ahrefs and JaneStreet blogs in this PR), and we scrape them.
TO DO