ocaml / ood

OCaml.org v3 data repository
Other
13 stars 8 forks source link

News #42

Closed tmattio closed 3 years ago

tmattio commented 3 years ago

Some initial work to add news articles/blog posts.

As discussed with @patricoferris, we don't want to port the old planet syndication because of the maintenance cost associated with it. Instead, we manually select the articles to use from a given source (only Tarides, Ahrefs and JaneStreet blogs in this PR), and we scrape them.

TO DO

tmattio commented 3 years ago

@patricoferris the scrapers are extremely minimal, it was just faster than copying a bunch of articles manually. What do you think of the approach in general?

tmattio commented 3 years ago

I added some basic scraping on the blog posts from the RSS feed to extract the preview image and description. The preview looks like this:

image

This can certainly be improved, for instance, we don't look into the HTML body for an image if we didn't find one in head, and we don't get the authors' names. But I guess this is enough for now.

tmattio commented 3 years ago

Some other things improve on this PR:

But I'd suggest we do this in a follow-up PR

avsm commented 3 years ago

The manual scraping here is great, but we should also clearly demarcate the "original content" on ocaml.org that originates from here too. The primary sources for that content so far are:

tmattio commented 3 years ago

Thanks for the feedback @avsm!

I need to merge this for the migration of v3, so I'll create an issue for the import of the original content. I'll open a follow-up PR for this 🙂