owid / owid-grapher

A platform for creating interactive data visualizations
https://ourworldindata.org
MIT License
1.36k stars 230 forks source link

Automate more breadcrumbs #3406

Open larsyencken opened 5 months ago

larsyencken commented 5 months ago

Problem

We see it as our role to put work in context; when it comes to articles, that context is often the broader topic or area that the article sits in.

Currently, one way we provide context for an article is with breadcrumbs (posts_gdocs.breadcrumbs), but they need to be manually specified every time. This is a waste of human effort, given that every article is tagged with a topic.

We would like instead do this automatically for each article.

Technical notes

If https://github.com/owid/owid-grapher/pull/3695/ is merged, we will soon be able to construct a tag graph.

A simplified example: tag graph example ⭐️= a topic

The top level tags are currently all assumed to be areas and don't have an associated page. The reason we have areas at all is because of the header nav (though there is no reason why we wouldn't be able to make these pages in the future.)

As long as we ensure every article is tagged with at least one topic tag, we'll be able to use this graph to construct breadcrumbs for every article.

We'll do this by constructing a subgraph that only features topic tags and displaying the path to any given topic.

So an article about indoor air pollution would be: Home > Air Pollution > Indoor Air Pollution

An article about cancer: Home > Cancer

An article about nuclear energy: Home > Energy > Nuclear Energy

An article about fossil fuels has two possible breadcrumb paths (because the Fossil Fuels tag has two parents that are topics.)

It could either be: Home > Greenhouse Gas Emissions > Fossil Fuels or Home > Energy > Fossil Fuels

The tag graph has weighted edges, so that we can set one edge to always be preferred over the others, but there will likely be cases where we have an article that would prefer to have the other edge highlighted.

For example, if we set the Energy-Fossil Fuels edge to have a higher weight, this will be fine for articles that are about energy and fossil fuels: we can show the default breadcrumbs of Home > Energy > Fossil Fuels, but if we have an article that's about fossil fuels' contributions to GHG emissions, we'll want the breadcrumbs to be Home > Greenhouse Gas Emissions > Fossil Fuels

A decision around whether or not parent tags need to be manually set ("parent implicitness") needs to be made. Ideally parent tags can be implicit the majority of the time and we can derive the breadcrumbs from the tag graph weights. In cases where we need to deviate from the default, authors can explicitly set the parent tag that they want to take priority, on the gdoc.

We can render this in the admin UI to ensure that authors always understand what the breadcrumbs for a given article will be:

No possible breadcrumb ambiguity: image

Breadcrumb ambiguity, but implicit tag is correct image

Breadcrumb ambiguity, implicit tag overridden image

An issue with implicit tag overrides is that the tag graph may get updated such that the overrides no longer make sense (e.g. if we delete the Greenhouse Gas Emissions-Fossil Fuels edge) This will probably be a rare occurrence but ideally we can have a way to migrate articles that would be affected by any update to the tag graph.

One more case to consider is that we may have articles that don't neatly fall into a single path of the graph. e.g. an article about nuclear energy and cancer.

In such cases, i.e. when an article is tagged with two leaf nodes, we may want to not render breadcrumbs at all and instead show the tags in a list: image

Another option could be to show multiple lanes of breadcrumbs two leaves example 1

We could try and merge them somehow 😬 two leaves example 3 two leaves example 2

Until we make a decision here, it's not clear how we should render this in the admin.

One final consideration is citations. Ideally, we're always happy to have the closest topic page for an article be its citation page. So an article about nuclear energy would be cited at https://ourworldindata.org/nuclear-energy

If this won't always be the case, we'll need a way to choose which parent tag should be cited instead: citation override A column to track this would have to be added in the gdocs_posts_x_tags table

danyx23 commented 4 months ago

@JoeHasell we talked about this a bit today but it seems a bit tricky. Maybe a good one to chat about with you in the next site meeting on May 14.

danyx23 commented 3 months ago

@ikesau can you sketch out the body of this issue since Joe and Lars are busy this week?

ikesau commented 3 months ago

@danyx23 done!

I think we need a decision on how to handle multi-leaf node articles. Once we've got that we should be good to go.

ikesau commented 2 months ago

It seems we're okay with picking a single leaf node, tiebreaking with the tag graph weights

We could also continue to support a way to manually override these breadcrumbs, in the cases where we want to defy the tag graph weights. We can continue to use the breadcrumbs column in posts_gdocs for this.