News - Githubissues

ocaml / ood

OCaml.org v3 data repository

Other

13 stars 8 forks source link

News #42

Closed tmattio closed 3 years ago

tmattio commented 3 years ago

Some initial work to add news articles/blog posts.

As discussed with @patricoferris, we don't want to port the old planet syndication because of the maintenance cost associated with it. Instead, we manually select the articles to use from a given source (only Tarides, Ahrefs and JaneStreet blogs in this PR), and we scrape them.

TO DO

[x] Handle parsing/download errors
[x] Generate metadata in markdown files
[x] News static module generation
[x] Add Ahrefs and JaneStreet articles

tmattio commented 3 years ago

@patricoferris the scrapers are extremely minimal, it was just faster than copying a bunch of articles manually. What do you think of the approach in general?

tmattio commented 3 years ago

I added some basic scraping on the blog posts from the RSS feed to extract the preview image and description. The preview looks like this:

This can certainly be improved, for instance, we don't look into the HTML body for an image if we didn't find one in head, and we don't get the authors' names. But I guess this is enough for now.

tmattio commented 3 years ago

Some other things improve on this PR:

[ ] Add the tags and source in the metadata
[ ] Sort by date

But I'd suggest we do this in a follow-up PR

avsm commented 3 years ago

The manual scraping here is great, but we should also clearly demarcate the "original content" on ocaml.org that originates from here too. The primary sources for that content so far are:

https://github.com/ocaml/platform-blog which is what feeds the opam.ocaml.org blog (and should be integrated into v3.ocaml.org)
various posts on discuss.ocaml.org such as the new compiler development monthly, or the multicore monthlies. Perhaps just a link to the discuss post is sufficient here? I'd class this as "original content" too; it's just that discuss is a convenient publishing mechanism for that at the moment.

tmattio commented 3 years ago

Thanks for the feedback @avsm!

I need to merge this for the migration of v3, so I'll create an issue for the import of the original content. I'll open a follow-up PR for this 🙂