owid / etl

A compute graph for loading and transforming OWID's data
https://docs.owid.io/projects/etl
MIT License
58 stars 18 forks source link

📊 guardian: decadal averages #2834

Closed lucasrodes closed 1 week ago

lucasrodes commented 2 weeks ago

Tracking issue

Add decadal averages.

owidbot commented 2 weeks ago
Quick links (staging server): Site Admin Wizard

Login: ssh owid@staging-site-news

chart-diff: ✅
  • 4/4 reviewed charts
    • Modified: 0/0
    • New: 4/4
data-diff: ❌ Found differences ```diff = Dataset garden/news/2024-05-08/guardian_mentions = Table guardian_mentions ~ Dim country + + New values: 14 / 2721 (0.51%) year country 2018 Saint Martin (French part) 2020 Saint Martin (French part) 2022 Saint Martin (French part) 2017 Sint Maarten (Dutch part) 2020 Sint Maarten (Dutch part) - - Removed values: 32 / 2721 (1.18%) year country 2018 Saint Martin 2018 Sint Maarten 2019 Sint Maarten 2016 Timor-Lest 2022 Timor-Lest ~ Dim year + + New values: 14 / 2721 (0.51%) country year Saint Martin (French part) 2018 Saint Martin (French part) 2020 Saint Martin (French part) 2022 Sint Maarten (Dutch part) 2017 Sint Maarten (Dutch part) 2020 - - Removed values: 32 / 2721 (1.18%) country year Saint Martin 2018 Sint Maarten 2018 Sint Maarten 2019 Timor-Lest 2016 Timor-Lest 2022 ~ Column num_pages_mentions (new data, changed data) + + New values: 14 / 2721 (0.51%) country year num_pages_mentions Saint Martin (French part) 2018 20 Saint Martin (French part) 2020 43 Saint Martin (French part) 2022 29 Sint Maarten (Dutch part) 2017 21 Sint Maarten (Dutch part) 2020 5 - - Removed values: 32 / 2721 (1.18%) country year num_pages_mentions Saint Martin 2018 20 Sint Maarten 2018 6 Sint Maarten 2019 1 Timor-Lest 2016 49 Timor-Lest 2022 114 ~ Changed values: 18 / 2721 (0.66%) country year num_pages_mentions - num_pages_mentions + East Timor 2017 43 East Timor 2022 114 Saint Martin (French part) 2023 35 United States Virgin Islands 2016 27 United States Virgin Islands 2020 21 ~ Column num_pages_mentions_per_million (changed metadata, new data, changed data) - - {} + + title: Number of pages in the Guardian that mention a country (per million people) + + description_short: Number of pages in the Guardian that mention a particular country, normalised by the population of the + + country. + + origins: + + - producer: The Guardian + + title: Attention to each country in The Guardian's articles (raw mentions) + + description: |- + + Aggregate estimates on the number of entries that talk about each country and year. + + + + The data was obtained by querying The Guardian's Open Platform. + + + + An entry or page in The Guardian is considered to be about a certain country if that particular country is mentioned in the text. To this end, we have used a set of country name variations to ensure that we capture all the entries. Nonetheless, this is not a perfect method and some entries might be missed. + + citation_full: The Guardian, Open Platform + + url_main: https://open-platform.theguardian.com/access/ + + date_accessed: '2024-05-07' + + date_published: '2024' + + license: + + name: The Guardian terms of service + + url: https://www.theguardian.com/help/terms-of-service + + - producer: Various sources + + title: Population + + description: |- + + Our World in Data builds and maintains a long-run dataset on population by country, region, and for the world, based on various sources. + + + + You can find more information on these sources and how our time series is constructed on this page: https://ourworldindata.org/population-sources + + citation_full: |- + + The long-run data on population is based on various sources, described on this page: https://ourworldindata.org/population-sources + + attribution: Population based on various sources (2023) + + attribution_short: Population + + url_main: https://ourworldindata.org/population-sources + + date_accessed: '2023-03-31' + + date_published: '2023-03-31' + + license: + + name: CC BY 4.0 + + licenses: + + - name: Creative Commons BY 4.0 + + url: https://docs.google.com/document/d/1-RmthhS2EPMK_HIpnPctcXpB0n7ADSWnXa5Hb3PxNq4/edit?usp=sharing + + - name: CC BY 3.0 + + url: https://dataportaal.pbl.nl/downloads/HYDE/HYDE3.2/readme_release_HYDE3.2.1.txt + + - name: CC BY 3.0 IGO + + url: http://creativecommons.org/licenses/by/3.0/igo/ + + unit: pages per million people + + processing_level: major + + presentation: + + topic_tags: + + - Uncategorized + + description_processing: |- + + Getting the number of articles/entries talking about a certain country has no straightforward answer, since there can be different strategies. The strategy for this indicator is based on first defining a set of country name variations for each country, and then look for content on The Guardian with an explicit mention to these names. + + + + + + 1. Get all country name variations: + + - Obtain all the country name variations using our standard name list. + + - Our list may not cover all cases, and may contain some names that are not valid on The Guardian API (e.g. names with symbols like ';' are not supported). Therefore, we clean this list. + + + + 2. For each country, obtain the number of pages using each set of name variations. Steps: + + - For each country and year we get all content metadata: a query like "https://content.guardianapis.com/search?q=...&from-date=2020-01-01&to-date=2020-12-31" for year 2020. The count of pages is in the property `response.total`. + + + + For mor details, please refer to the snapshot script. + + New values: 14 / 2721 (0.51%) country year num_pages_mentions_per_million Saint Martin (French part) 2018 590.336182 Saint Martin (French part) 2020 1319.949707 Saint Martin (French part) 2022 911.491089 Sint Maarten (Dutch part) 2017 501.181366 Sint Maarten (Dutch part) 2020 114.579033 - - Removed values: 32 / 2721 (1.18%) country year num_pages_mentions_per_million Saint Martin 2018 NaN Sint Maarten 2018 NaN Sint Maarten 2019 NaN Timor-Lest 2016 NaN Timor-Lest 2022 NaN ~ Changed values: 2551 / 2721 (93.75%) country year num_pages_mentions_per_million - num_pages_mentions_per_million + Cameroon 2022 NaN 10.173908 Curacao 2014 NaN 17.853529 Ecuador 2023 NaN 11.379571 Latvia 2017 NaN 56.269985 Tokelau 2016 NaN 1382.170044 ~ Column num_pages_mentions_relative (changed metadata, new data, changed data) + + {} - - title: Share of pages in The Guardian that mention a country - - description_short: Share of pages in The Guardian that that mention a particular country. - - origins: - - - producer: The Guardian - - title: Attention to each country in The Guardian's articles (raw mentions) - - description: |- - - Aggregate estimates on the number of entries that talk about each country and year. - - - - The data was obtained by querying The Guardian's Open Platform. - - - - An entry or page in The Guardian is considered to be about a certain country if that particular country is mentioned in the text. To this end, we have used a set of country name variations to ensure that we capture all the entries. Nonetheless, this is not a perfect method and some entries might be missed. - - citation_full: The Guardian, Open Platform - - url_main: https://open-platform.theguardian.com/access/ - - date_accessed: '2024-05-07' - - date_published: '2024' - - license: - - name: The Guardian terms of service - - url: https://www.theguardian.com/help/terms-of-service - - unit: pages per 100,000 pages - - presentation: - - topic_tags: - - - Uncategorized - - description_processing: |- - - Getting the number of articles/entries talking about a certain country has no straightforward answer, since there can be different strategies. The strategy for this indicator is based on first defining a set of country name variations for each country, and then look for content on The Guardian with an explicit mention to these names. - - - - - - 1. Get all country name variations: - - - Obtain all the country name variations using our standard name list. - - - Our list may not cover all cases, and may contain some names that are not valid on The Guardian API (e.g. names with symbols like ';' are not supported). Therefore, we clean this list. - - - - 2. For each country, obtain the number of pages using each set of name variations. Steps: - - - For each country and year we get all content metadata: a query like "https://content.guardianapis.com/search?q=...&from-date=2020-01-01&to-date=2020-12-31" for year 2020. The count of pages is in the property `response.total`. - - - - For mor details, please refer to the snapshot script. + + New values: 14 / 2721 (0.51%) country year num_pages_mentions_relative Saint Martin (French part) 2018 NaN Saint Martin (French part) 2020 NaN Saint Martin (French part) 2022 NaN Sint Maarten (Dutch part) 2017 NaN Sint Maarten (Dutch part) 2020 NaN - - Removed values: 32 / 2721 (1.18%) country year num_pages_mentions_relative Saint Martin 2018 11.948002 Sint Maarten 2018 3.584401 Sint Maarten 2019 0.607371 Timor-Lest 2016 22.733494 Timor-Lest 2022 58.200977 ~ Changed values: 2643 / 2721 (97.13%) country year num_pages_mentions_relative - num_pages_mentions_relative + Eswatini 2020 11.738843 NaN Libya 2017 382.171539 NaN Mayotte 2014 3.108569 NaN Saudi Arabia 2018 632.049316 NaN Yemen 2022 118.444092 NaN ~ Column num_pages_tags (new data) + + New values: 14 / 2721 (0.51%) country year num_pages_tags Saint Martin (French part) 2018 Saint Martin (French part) 2020 Saint Martin (French part) 2022 Sint Maarten (Dutch part) 2017 Sint Maarten (Dutch part) 2020 - - Removed values: 32 / 2721 (1.18%) country year num_pages_tags Saint Martin 2018 Sint Maarten 2018 Sint Maarten 2019 Timor-Lest 2016 Timor-Lest 2022 ~ Column num_pages_tags_per_million (changed metadata, new data, changed data) - - {} + + title: Number of pages in the Guardian with a country tag (per million people) + + description_short: |- + + Number of pages in the Guardian that are tagged with a country-related label, normalised by the population of the country. + + origins: + + - producer: The Guardian + + title: Attention to each country in The Guardian's articles (tags) + + description: |- + + Aggregate estimates on the number of entries that talk about each country and year. + + + + The data was obtained by querying The Guardian's Open Platform. + + + + An entry or page in The Guardian is considered to be about a certain country if that particular country if it is tagged with a country-related label. To this end, we have used a set of tags for each country. Nonetheless, this is not a perfect method and some entries might be missed. + + citation_full: The Guardian, Open Platform + + url_main: https://open-platform.theguardian.com/access/ + + date_accessed: '2024-05-07' + + date_published: '2024' + + license: + + name: The Guardian terms of service + + url: https://www.theguardian.com/help/terms-of-service + + - producer: Various sources + + title: Population + + description: |- + + Our World in Data builds and maintains a long-run dataset on population by country, region, and for the world, based on various sources. + + + + You can find more information on these sources and how our time series is constructed on this page: https://ourworldindata.org/population-sources + + citation_full: |- + + The long-run data on population is based on various sources, described on this page: https://ourworldindata.org/population-sources + + attribution: Population based on various sources (2023) + + attribution_short: Population + + url_main: https://ourworldindata.org/population-sources + + date_accessed: '2023-03-31' + + date_published: '2023-03-31' + + license: + + name: CC BY 4.0 + + licenses: + + - name: Creative Commons BY 4.0 + + url: https://docs.google.com/document/d/1-RmthhS2EPMK_HIpnPctcXpB0n7ADSWnXa5Hb3PxNq4/edit?usp=sharing + + - name: CC BY 3.0 + + url: https://dataportaal.pbl.nl/downloads/HYDE/HYDE3.2/readme_release_HYDE3.2.1.txt + + - name: CC BY 3.0 IGO + + url: http://creativecommons.org/licenses/by/3.0/igo/ + + unit: pages per million people + + processing_level: major + + presentation: + + topic_tags: + + - Uncategorized + + description_processing: |- + + Getting the number of articles/entries talking about a certain country has no straightforward answer, since there can be different strategies. The strategy for this indicator is based on first getting all the tags for a country, and then getting the number of articles that have those tags. + + + + + + 1. Obtain all tags that concern a country: + + - Obtain all the tag pages that have a title starting with a country name: a query like "https://content.guardianapis.com/tags?web-title=spain", for Spain. As a result we obtain a mapping that tells us for each country the list of tags (e.g. "Spain: ") in use. + + - We work with a list of ~240 countries. + + - Getting the right country names has been an iterative process, trying to align our standard country names with the Guardian's. + + + + 2. For each country, obtain the number of pages using each set of tags. Steps: + + - For each country and year we get all content metadata: a query like "https://content.guardianapis.com/search?tags=...&from-date=2020-01-01&to-date=2020-12-31" for year 2020. The count of pages is in the property `response.total`. + + + + For mor details, please refer to the snapshot script. + + New values: 14 / 2721 (0.51%) country year num_pages_tags_per_million Saint Martin (French part) 2018 NaN Saint Martin (French part) 2020 NaN Saint Martin (French part) 2022 NaN Sint Maarten (Dutch part) 2017 NaN Sint Maarten (Dutch part) 2020 NaN - - Removed values: 32 / 2721 (1.18%) country year num_pages_tags_per_million Saint Martin 2018 NaN Sint Maarten 2018 NaN Sint Maarten 2019 NaN Timor-Lest 2016 NaN Timor-Lest 2022 NaN ~ Changed values: 2547 / 2721 (93.61%) country year num_pages_tags_per_million - num_pages_tags_per_million + Canada 2014 NaN 8.840655 Gibraltar 2018 NaN 826.269226 Japan 2016 NaN 3.078889 Latvia 2016 NaN 6.080638 Tokelau 2013 NaN 0.000000 ~ Column num_pages_tags_relative (changed metadata, new data, changed data) + + {} - - title: Share of pages in the Guardian with a country tag - - description_short: Share of pages in The Guardian that are tagged with a country-related label. - - origins: - - - producer: The Guardian - - title: Attention to each country in The Guardian's articles (tags) - - description: |- - - Aggregate estimates on the number of entries that talk about each country and year. - - - - The data was obtained by querying The Guardian's Open Platform. - - - - An entry or page in The Guardian is considered to be about a certain country if that particular country if it is tagged with a country-related label. To this end, we have used a set of tags for each country. Nonetheless, this is not a perfect method and some entries might be missed. - - citation_full: The Guardian, Open Platform - - url_main: https://open-platform.theguardian.com/access/ - - date_accessed: '2024-05-07' - - date_published: '2024' - - license: - - name: The Guardian terms of service - - url: https://www.theguardian.com/help/terms-of-service - - unit: pages per 100,000 pages - - presentation: - - topic_tags: - - - Uncategorized - - description_processing: |- - - Getting the number of articles/entries talking about a certain country has no straightforward answer, since there can be different strategies. The strategy for this indicator is based on first getting all the tags for a country, and then getting the number of articles that have those tags. - - - - - - 1. Obtain all tags that concern a country: - - - Obtain all the tag pages that have a title starting with a country name: a query like "https://content.guardianapis.com/tags?web-title=spain", for Spain. As a result we obtain a mapping that tells us for each country the list of tags (e.g. "Spain: ") in use. - - - We work with a list of ~240 countries. - - - Getting the right country names has been an iterative process, trying to align our standard country names with the Guardian's. - - - - 2. For each country, obtain the number of pages using each set of tags. Steps: - - - For each country and year we get all content metadata: a query like "https://content.guardianapis.com/search?tags=...&from-date=2020-01-01&to-date=2020-12-31" for year 2020. The count of pages is in the property `response.total`. - - - - For mor details, please refer to the snapshot script. + + New values: 14 / 2721 (0.51%) country year num_pages_tags_relative Saint Martin (French part) 2018 NaN Saint Martin (French part) 2020 NaN Saint Martin (French part) 2022 NaN Sint Maarten (Dutch part) 2017 NaN Sint Maarten (Dutch part) 2020 NaN - - Removed values: 32 / 2721 (1.18%) country year num_pages_tags_relative Saint Martin 2018 NaN Sint Maarten 2018 NaN Sint Maarten 2019 NaN Timor-Lest 2016 NaN Timor-Lest 2022 NaN ~ Changed values: 2634 / 2721 (96.80%) country year num_pages_tags_relative - num_pages_tags_relative + Argentina 2013 226.496490 NaN Aruba 2016 0.619191 NaN Central African Republic 2017 10.622900 NaN Congo 2018 57.331131 NaN Taiwan 2020 50.219105 NaN ~ Column relative_pages_mentions (changed metadata, new data, changed data) - - {} + + title: Share of pages in The Guardian that mention a country + + description_short: Share of pages in The Guardian that that mention a particular country. + + origins: + + - producer: The Guardian + + title: Attention to each country in The Guardian's articles (raw mentions) + + description: |- + + Aggregate estimates on the number of entries that talk about each country and year. + + + + The data was obtained by querying The Guardian's Open Platform. + + + + An entry or page in The Guardian is considered to be about a certain country if that particular country is mentioned in the text. To this end, we have used a set of country name variations to ensure that we capture all the entries. Nonetheless, this is not a perfect method and some entries might be missed. + + citation_full: The Guardian, Open Platform + + url_main: https://open-platform.theguardian.com/access/ + + date_accessed: '2024-05-07' + + date_published: '2024' + + license: + + name: The Guardian terms of service + + url: https://www.theguardian.com/help/terms-of-service + + unit: pages per 100,000 pages + + presentation: + + topic_tags: + + - Uncategorized + + description_processing: |- + + Getting the number of articles/entries talking about a certain country has no straightforward answer, since there can be different strategies. The strategy for this indicator is based on first defining a set of country name variations for each country, and then look for content on The Guardian with an explicit mention to these names. + + + + + + 1. Get all country name variations: + + - Obtain all the country name variations using our standard name list. + + - Our list may not cover all cases, and may contain some names that are not valid on The Guardian API (e.g. names with symbols like ';' are not supported). Therefore, we clean this list. + + + + 2. For each country, obtain the number of pages using each set of name variations. Steps: + + - For each country and year we get all content metadata: a query like "https://content.guardianapis.com/search?q=...&from-date=2020-01-01&to-date=2020-12-31" for year 2020. The count of pages is in the property `response.total`. + + + + For mor details, please refer to the snapshot script. + + New values: 14 / 2721 (0.51%) country year relative_pages_mentions Saint Martin (French part) 2018 11.948002 Saint Martin (French part) 2020 22.944101 Saint Martin (French part) 2022 14.805511 Sint Maarten (Dutch part) 2017 12.196963 Sint Maarten (Dutch part) 2020 2.667919 - - Removed values: 32 / 2721 (1.18%) country year relative_pages_mentions Saint Martin 2018 NaN Sint Maarten 2018 NaN Sint Maarten 2019 NaN Timor-Lest 2016 NaN Timor-Lest 2022 NaN ~ Changed values: 2661 / 2721 (97.79%) country year relative_pages_mentions - relative_pages_mentions + Bahamas 2015 NaN 45.872166 El Salvador 2017 NaN 56.919163 Guernsey 2022 NaN 28.079418 Liechtenstein 2021 NaN 40.499123 Saudi Arabia 2023 NaN 686.867859 ~ Column relative_pages_mentions_excluded (changed metadata, new data, changed data) - - {} + + title: Share of pages in The Guardian that mention a country (excludes UK, US, Australia) + + description_short: Share of pages in The Guardian that are tagged with a country-related label. Excludes US, UK and Australia. + + origins: + + - producer: The Guardian + + title: Attention to each country in The Guardian's articles (raw mentions) + + description: |- + + Aggregate estimates on the number of entries that talk about each country and year. + + + + The data was obtained by querying The Guardian's Open Platform. + + + + An entry or page in The Guardian is considered to be about a certain country if that particular country is mentioned in the text. To this end, we have used a set of country name variations to ensure that we capture all the entries. Nonetheless, this is not a perfect method and some entries might be missed. + + citation_full: The Guardian, Open Platform + + url_main: https://open-platform.theguardian.com/access/ + + date_accessed: '2024-05-07' + + date_published: '2024' + + license: + + name: The Guardian terms of service + + url: https://www.theguardian.com/help/terms-of-service + + unit: pages per 100,000 pages + + presentation: + + topic_tags: + + - Uncategorized + + description_processing: |- + + Getting the number of articles/entries talking about a certain country has no straightforward answer, since there can be different strategies. The strategy for this indicator is based on first defining a set of country name variations for each country, and then look for content on The Guardian with an explicit mention to these names. + + + + + + 1. Get all country name variations: + + - Obtain all the country name variations using our standard name list. + + - Our list may not cover all cases, and may contain some names that are not valid on The Guardian API (e.g. names with symbols like ';' are not supported). Therefore, we clean this list. + + + + 2. For each country, obtain the number of pages using each set of name variations. Steps: + + - For each country and year we get all content metadata: a query like "https://content.guardianapis.com/search?q=...&from-date=2020-01-01&to-date=2020-12-31" for year 2020. The count of pages is in the property `response.total`. + + + + For mor details, please refer to the snapshot script. + + + + This estimates exclude the UK, US, and Australia from the total number of pages. The reason for this is because the Guardian is a UK-based newspaper, and it is expected to have a higher number of articles about the UK, US, and Australia. + + New values: 14 / 2721 (0.51%) country year relative_pages_mentions_excluded Saint Martin (French part) 2018 20.384451 Saint Martin (French part) 2020 39.068180 Saint Martin (French part) 2022 24.335392 Sint Maarten (Dutch part) 2017 21.586283 Sint Maarten (Dutch part) 2020 4.542811 - - Removed values: 32 / 2721 (1.18%) country year relative_pages_mentions_excluded Saint Martin 2018 NaN Sint Maarten 2018 NaN Sint Maarten 2019 NaN Timor-Lest 2016 NaN Timor-Lest 2022 NaN ~ Changed values: 2628 / 2721 (96.58%) country year relative_pages_mentions_excluded - relative_pages_mentions_excluded + Bouvet Island 2019 NaN 0.000000 Jamaica 2017 NaN 328.933838 Rwanda 2019 NaN 151.658157 Saudi Arabia 2023 NaN 1138.211426 Vatican 2013 NaN 450.137909 ~ Column relative_pages_tags (changed metadata, new data, changed data) - - {} + + title: Share of pages in the Guardian with a country tag + + description_short: Share of pages in The Guardian that are tagged with a country-related label. + + origins: + + - producer: The Guardian + + title: Attention to each country in The Guardian's articles (tags) + + description: |- + + Aggregate estimates on the number of entries that talk about each country and year. + + + + The data was obtained by querying The Guardian's Open Platform. + + + + An entry or page in The Guardian is considered to be about a certain country if that particular country if it is tagged with a country-related label. To this end, we have used a set of tags for each country. Nonetheless, this is not a perfect method and some entries might be missed. + + citation_full: The Guardian, Open Platform + + url_main: https://open-platform.theguardian.com/access/ + + date_accessed: '2024-05-07' + + date_published: '2024' + + license: + + name: The Guardian terms of service + + url: https://www.theguardian.com/help/terms-of-service + + unit: pages per 100,000 pages + + presentation: + + topic_tags: + + - Uncategorized + + description_processing: |- + + Getting the number of articles/entries talking about a certain country has no straightforward answer, since there can be different strategies. The strategy for this indicator is based on first getting all the tags for a country, and then getting the number of articles that have those tags. + + + + + + 1. Obtain all tags that concern a country: + + - Obtain all the tag pages that have a title starting with a country name: a query like "https://content.guardianapis.com/tags?web-title=spain", for Spain. As a result we obtain a mapping that tells us for each country the list of tags (e.g. "Spain: ") in use. + + - We work with a list of ~240 countries. + + - Getting the right country names has been an iterative process, trying to align our standard country names with the Guardian's. + + + + 2. For each country, obtain the number of pages using each set of tags. Steps: + + - For each country and year we get all content metadata: a query like "https://content.guardianapis.com/search?tags=...&from-date=2020-01-01&to-date=2020-12-31" for year 2020. The count of pages is in the property `response.total`. + + + + For mor details, please refer to the snapshot script. + + New values: 14 / 2721 (0.51%) country year relative_pages_tags Saint Martin (French part) 2018 NaN Saint Martin (French part) 2020 NaN Saint Martin (French part) 2022 NaN Sint Maarten (Dutch part) 2017 NaN Sint Maarten (Dutch part) 2020 NaN - - Removed values: 32 / 2721 (1.18%) country year relative_pages_tags Saint Martin 2018 NaN Sint Maarten 2018 NaN Sint Maarten 2019 NaN Timor-Lest 2016 NaN Timor-Lest 2022 NaN ~ Changed values: 2634 / 2721 (96.80%) country year relative_pages_tags - relative_pages_tags + Argentina 2013 NaN 226.496490 Aruba 2016 NaN 0.619191 Central African Republic 2017 NaN 10.622900 Congo 2018 NaN 57.331131 Taiwan 2020 NaN 50.219105 ~ Column relative_pages_tags_excluded (changed metadata, new data, changed data) - - {} + + title: Share of pages in the Guardian with a country tag (excludes UK, US, Australia) + + description_short: Share of pages in The Guardian that are tagged with a country-related label. Excludes US, UK and Australia. + + origins: + + - producer: The Guardian + + title: Attention to each country in The Guardian's articles (tags) + + description: |- + + Aggregate estimates on the number of entries that talk about each country and year. + + + + The data was obtained by querying The Guardian's Open Platform. + + + + An entry or page in The Guardian is considered to be about a certain country if that particular country if it is tagged with a country-related label. To this end, we have used a set of tags for each country. Nonetheless, this is not a perfect method and some entries might be missed. + + citation_full: The Guardian, Open Platform + + url_main: https://open-platform.theguardian.com/access/ + + date_accessed: '2024-05-07' + + date_published: '2024' + + license: + + name: The Guardian terms of service + + url: https://www.theguardian.com/help/terms-of-service + + unit: pages per 100,000 pages + + presentation: + + topic_tags: + + - Uncategorized + + description_processing: |- + + Getting the number of articles/entries talking about a certain country has no straightforward answer, since there can be different strategies. The strategy for this indicator is based on first getting all the tags for a country, and then getting the number of articles that have those tags. + + + + + + 1. Obtain all tags that concern a country: + + - Obtain all the tag pages that have a title starting with a country name: a query like "https://content.guardianapis.com/tags?web-title=spain", for Spain. As a result we obtain a mapping that tells us for each country the list of tags (e.g. "Spain: ") in use. + + - We work with a list of ~240 countries. + + - Getting the right country names has been an iterative process, trying to align our standard country names with the Guardian's. + + + + 2. For each country, obtain the number of pages using each set of tags. Steps: + + - For each country and year we get all content metadata: a query like "https://content.guardianapis.com/search?tags=...&from-date=2020-01-01&to-date=2020-12-31" for year 2020. The count of pages is in the property `response.total`. + + + + For mor details, please refer to the snapshot script. + + + + This estimates exclude the UK, US, and Australia from the total number of pages. The reason for this is because the Guardian is a UK-based newspaper, and it is expected to have a higher number of articles about the UK, US, and Australia. + + New values: 14 / 2721 (0.51%) country year relative_pages_tags_excluded Saint Martin (French part) 2018 NaN Saint Martin (French part) 2020 NaN Saint Martin (French part) 2022 NaN Sint Maarten (Dutch part) 2017 NaN Sint Maarten (Dutch part) 2020 NaN - - Removed values: 32 / 2721 (1.18%) country year relative_pages_tags_excluded Saint Martin 2018 NaN Sint Maarten 2018 NaN Sint Maarten 2019 NaN Timor-Lest 2016 NaN Timor-Lest 2022 NaN ~ Changed values: 2601 / 2721 (95.59%) country year relative_pages_tags_excluded - relative_pages_tags_excluded + India 2023 NaN 2470.757324 Moldova 2014 NaN 94.517960 Singapore 2020 NaN 164.049683 Trinidad and Tobago 2023 NaN 82.084961 Turkey 2019 NaN 1597.967285 + + Table avg_10y + + Column num_pages_tags_10y_avg + + Column num_pages_mentions_10y_avg + + Column relative_pages_tags_10y_avg + + Column relative_pages_tags_excluded_10y_avg + + Column relative_pages_mentions_10y_avg + + Column relative_pages_mentions_excluded_10y_avg + + Column num_pages_tags_per_million_10y_avg + + Column num_pages_mentions_per_million_10y_avg Legend: +New ~Modified -Removed =Identical Details Hint: Run this locally with etl diff REMOTE data/ --include yourdataset --verbose --snippet ``` Automatically updated datasets matching _weekly_wildfires|excess_mortality|covid|fluid|flunet|country_profile|garden/ihme_gbd/2019/gbd_risk_ are not included

Edited: 2024-06-19 12:50:13 UTC Execution time: 3.63 seconds