Closed danyx23 closed 9 months ago
I've restructured the list, making it easier to work on my datasets. I'd suggest you do the same, @pabloarosado. With the ones remaining, we can either distribute them among us both or let others know. Can decide once we're finished with ours.
I've added tags to all datasets I've worked on: https://github.com/owid/etl/pull/1870 I've realised some of the datasets listed above are still using old metadata (e.g. Energy mix). I haven't added topic tags to those.
We can just distribute of the list half and half among us, and then add the others as reviewers.
I see that grapher steps are failing (also in my local grapher). It's related to grapher_model
, something connected to the displayOrder
of the topic tags. Maybe some migration needs to be done in the database. @danyx23 were you expecting this or is it a bug?
@pabloarosado when are you getting this error?
I was getting this error when using Jinja on topic_tags
(see my comment).
Could be related to this update https://github.com/owid/etl/pull/1863? @Marigold
@pabloarosado sorry, my bad! Could you please rebase on top of the master?
@pabloarosado I've re-structured the list again to clarify which datasets each of us is tackling.
I've finished my part, you can review my changes here:
I've created an additional PR with the changes affecting datasets from others (no need to review this):
It's not clear to me what to do with auxiliary indicators, like population or GDP, that we have sometimes within a dataset. For example, natural disasters has those indicators. These indicators only exist because it's convenient (for us and for superuser downloading the dataset) to have population and GDP next to the per capita and per gdp variables. They're useful to do sanity checks.
Population
or Economic Growth
, because they shouldn't appear anywhere on those topics.Natural Disasters
, since we don't want them to appear on those searches.So, I'll tag them as "Uncategorized". Do you agree @lucasrodes ?
Yeah, that sounds reasonable, @pabloarosado!
I've checked most of the datasets in the list, since they were using the old metadata (including WDI). I understand that they don't need to be tagged.
@paarriagadap among the tags in this list, which ones would you assign (being the first tag the most important one) to the indicators of:
@spoonerf among the tags in this list, which ones would you assign (being the first tag the most important one) to the indicators of:
Thanks both!
Hey @pabloarosado!
I would go with Child & Infant Mortality
for the United Nations Inter-agency Group for Child Mortality Estimation dataset.
Just be aware that the Datasette link you shared is showing only the first 101 topic tags ordered by id, but there are 128 topic tags. Maybe this link is better.
@pabloarosado I'd use 'State Capacity' for 'State Capacity Dataset' and 'Colonial Dates Dataset'. And possible 'Human Development Index (AHDI)' for the 'Augmented Human Development Index'
Hi! I was away last Friday. Yes, the ones described by @lucasrodes are the ones to use. It's Human Development Index (HDI)
, by the way.
Hey @lucasrodes I think the two PRs are ready to be merged (assuming that there are no other new CI surprises with random steps). If everything goes well, shall we merge by the end of the day?
This issue was completed late October, but I suppose we forgot to close it.
For the new datapages it would be extremely useful if we could guarantee that every indicator has at least one
topic_tag
assigned. This will ensure that we can link our topic pages with our data pages and help in showing the most relevant content in various places.The easiest way to ensure this is to set one (or more) tag at the dataset level. This is not always appropriate but works in many cases. Tags have to match a topic tag in the database (case sensitive). The schema has been updated to include the current topic tags so you should get autocomplete in the yaml file for this field. The only non-topic that that can be used in "Uncategorized" which can be used for cases where you explicitly don't want to assign a topic (e.g. useful in cases like the WDI where every indicator needs a human or AI assigned tag and you don't have the time to do it yet).
The easiest way to assign a tag to all indicators in a dataset is to set it via the common section in the yaml file:
As of today we have about 100 datasets that have at least one indicator with new metadata. The list below has a checkmark to indicate if a tag has been added. Then it states the number of indicator the dataset has, the dataset name and the name is linked to the probable location of the metadata yaml file that you need to edit with the above snippet.
@lucasrodes
From Lucas
(working on #1869)
From Veronika
(https://github.com/owid/etl/pull/1872)
From Fiona
(https://github.com/owid/etl/pull/1872)
@pabloarosado
From Pablo R
From Mojmir
From Pablo A
Others
Datasette query for the above list
The following datasette query was used to arrive at the above list: ```sql with datasets_with_counts as ( select d.name as name, count(*) as variableCount, json_group_array(distinct v.catalogPath) as paths from datasets d join variables v on v.datasetId = d.id where v.schemaVersion = 2 group by d.id order by count(*) desc ) SELECT name, variableCount, json_each.value AS path FROM datasets_with_counts t, json_each(t.paths) ```