owid / etl

A compute graph for loading and transforming OWID's data
https://docs.owid.io/projects/etl
MIT License
58 stars 18 forks source link

📊 Fix FAOSTAT issues #2916

Closed pabloarosado closed 3 days ago

pabloarosado commented 3 days ago

Fix a few data issues found in FAOSTAT data. More details in https://github.com/owid/owid-issues/issues/1596

owidbot commented 3 days ago
Quick links (staging server): Site Admin Wizard

Login: ssh owid@staging-site-fix-faostat-issues

chart-diff: ✅ No charts for review.
data-diff: ❌ Found differences ```diff = Dataset garden/agriculture/2024-03-26/attainable_yields = Table attainable_yields = Dataset garden/agriculture/2024-03-26/long_term_crop_yields = Table long_term_crop_yields = Dataset garden/agriculture/2024-03-26/long_term_wheat_yields = Table long_term_wheat_yields = Dataset garden/agriculture/2024-03-26/uk_long_term_yields = Table uk_long_term_yields = Dataset garden/agriculture/2024-05-23/daily_calories_per_person = Table daily_calories_per_person = Dataset garden/animal_welfare/2024-05-20/animals_used_for_food = Table animals_used_for_food = Dataset garden/faostat/2024-03-14/additional_variables = Table fertilizer_exports = Table fertilizers = Table share_of_sustainable_and_overexploited_fish = Table vegetable_oil_yields = Table cereal_allocation = Table macronutrient_compositions = Table area_used_per_crop_type = Table arable_land_per_crop_output = Table food_available_for_consumption = Table hypothetical_meat_consumption = Table land_spared_by_increased_crop_yields = Table agriculture_land_use_evolution = Table maize_and_wheat ~ Dataset garden/faostat/2024-03-14/faostat_fbsc + + + + +Malaysia's milk consumption from 2010 onwards is unreasonably high. = Table faostat_fbsc_flat ~ Column milk__excluding_butter__00002848__food_available_for_consumption__000645__kilograms_per_year (changed data) ~ Changed values: 12 / 12751 (0.09%) country year milk__excluding_butter__00002848__food_available_for_consumption__000645__kilograms_per_year - milk__excluding_butter__00002848__food_available_for_consumption__000645__kilograms_per_year + Malaysia 2012 1.483663e+10 NaN Malaysia 2014 1.867453e+10 NaN Malaysia 2016 2.753707e+10 NaN Malaysia 2020 3.448084e+10 NaN Malaysia 2021 4.019866e+10 NaN ~ Column milk__excluding_butter__00002848__food_available_for_consumption__0645pc__kilograms_per_year_per_capita (changed data) ~ Changed values: 12 / 12751 (0.09%) country year milk__excluding_butter__00002848__food_available_for_consumption__0645pc__kilograms_per_year_per_capita - milk__excluding_butter__00002848__food_available_for_consumption__0645pc__kilograms_per_year_per_capita + Malaysia 2012 500.219910 NaN Malaysia 2014 610.150146 NaN Malaysia 2016 873.460205 NaN Malaysia 2020 1038.579956 NaN Malaysia 2021 1197.319946 NaN = Table faostat_fbsc ~ Dim area_code - - Removed values: 24 / 27091454 (0.00%) year item_code element_code area_code 2015 00002848 000645 131 2015 00002848 0645pc 131 2017 00002848 000645 131 2020 00002848 000645 131 2021 00002848 000645 131 ~ Dim year - - Removed values: 24 / 27091454 (0.00%) area_code item_code element_code year 131 00002848 000645 2015 131 00002848 0645pc 2015 131 00002848 000645 2017 131 00002848 000645 2020 131 00002848 000645 2021 ~ Dim item_code - - Removed values: 24 / 27091454 (0.00%) area_code year element_code item_code 131 2015 000645 00002848 131 2015 0645pc 00002848 131 2017 000645 00002848 131 2020 000645 00002848 131 2021 000645 00002848 ~ Dim element_code - - Removed values: 24 / 27091454 (0.00%) area_code year item_code element_code 131 2015 00002848 000645 131 2015 00002848 0645pc 131 2017 00002848 000645 131 2020 00002848 000645 131 2021 00002848 000645 ~ Column country (changed data) - - Removed values: 24 / 27091454 (0.00%) area_code year item_code element_code country 131 2015 00002848 000645 Malaysia 131 2015 00002848 0645pc Malaysia 131 2017 00002848 000645 Malaysia 131 2020 00002848 000645 Malaysia 131 2021 00002848 000645 Malaysia ~ Column element (changed data) - - Removed values: 24 / 27091454 (0.00%) area_code year item_code element_code element 131 2015 00002848 000645 Food available for consumption 131 2015 00002848 0645pc Food available for consumption 131 2017 00002848 000645 Food available for consumption 131 2020 00002848 000645 Food available for consumption 131 2021 00002848 000645 Food available for consumption ~ Column element_description (changed data) - - Removed values: 24 / 27091454 (0.00%) area_code year item_code element_code element_description 131 2015 00002848 000645 Originally given per-capita, and converted into total figures by multiplying by population (given by FAO). 131 2015 00002848 0645pc Originally given per-capita, and converted into total figures by multiplying by population (given by FAO). Per-capita values are obtained by dividing the original values by the population (either provided by FAO or by OWID). 131 2017 00002848 000645 Originally given per-capita, and converted into total figures by multiplying by population (given by FAO). 131 2020 00002848 000645 Originally given per-capita, and converted into total figures by multiplying by population (given by FAO). 131 2021 00002848 000645 Originally given per-capita, and converted into total figures by multiplying by population (given by FAO). ~ Column fao_country (changed data) - - Removed values: 24 / 27091454 (0.00%) area_code year item_code element_code fao_country 131 2015 00002848 000645 Malaysia 131 2015 00002848 0645pc Malaysia 131 2017 00002848 000645 Malaysia 131 2020 00002848 000645 Malaysia 131 2021 00002848 000645 Malaysia ~ Column fao_element (changed data) - - Removed values: 24 / 27091454 (0.00%) area_code year item_code element_code fao_element 131 2015 00002848 000645 Food supply quantity (kg/capita/yr) 131 2015 00002848 0645pc Food supply quantity (kg/capita/yr) 131 2017 00002848 000645 Food supply quantity (kg/capita/yr) 131 2020 00002848 000645 Food supply quantity (kg/capita/yr) 131 2021 00002848 000645 Food supply quantity (kg/capita/yr) ~ Column fao_item (changed data) - - Removed values: 24 / 27091454 (0.00%) area_code year item_code element_code fao_item 131 2015 00002848 000645 Milk - Excluding Butter 131 2015 00002848 0645pc Milk - Excluding Butter 131 2017 00002848 000645 Milk - Excluding Butter 131 2020 00002848 000645 Milk - Excluding Butter 131 2021 00002848 000645 Milk - Excluding Butter ~ Column fao_population (changed data) - - Removed values: 24 / 27091454 (0.00%) area_code year item_code element_code fao_population 131 2015 00002848 000645 31068830.0 131 2015 00002848 0645pc 31068830.0 131 2017 00002848 000645 31975810.0 131 2020 00002848 000645 33199988.0 131 2021 00002848 000645 33573872.0 ~ Column fao_unit_short_name (changed data) - - Removed values: 24 / 27091454 (0.00%) area_code year item_code element_code fao_unit_short_name 131 2015 00002848 000645 kg 131 2015 00002848 0645pc kg 131 2017 00002848 000645 kg 131 2020 00002848 000645 kg 131 2021 00002848 000645 kg ~ Column flag (changed data) - - Removed values: 24 / 27091454 (0.00%) area_code year item_code element_code flag 131 2015 00002848 000645 E 131 2015 00002848 0645pc E 131 2017 00002848 000645 E 131 2020 00002848 000645 E 131 2021 00002848 000645 E ~ Column item (changed data) - - Removed values: 24 / 27091454 (0.00%) area_code year item_code element_code item 131 2015 00002848 000645 Milk - Excluding Butter 131 2015 00002848 0645pc Milk - Excluding Butter 131 2017 00002848 000645 Milk - Excluding Butter 131 2020 00002848 000645 Milk - Excluding Butter 131 2021 00002848 000645 Milk - Excluding Butter ~ Column item_description (changed data) - - Removed values: 24 / 27091454 (0.00%) area_code year item_code element_code item_description 131 2015 00002848 000645 Default composition: 882 Milk, whole fresh cow, 888 Milk, skimmed cow, 889 Milk, whole condensed, 890 Whey, condensed, 891 Yoghurt, 892 Yoghurt, concentrated or not, 893 Buttermilk, curdled, acidified milk, 894 Milk, whole evaporated, 895 Milk, skimmed evaporated, 896 Milk, skimmed condensed, 897 Milk, whole dried, 898 Milk, skimmed dried, 899 Milk, dry buttermilk, 900 Whey, dry, 901 Cheese, whole cow milk, 903 Whey, fresh, 904 Cheese, skimmed cow milk, 905 Whey, cheese, 907 Cheese, processed, 908 Milk, reconstituted, 909 Milk, products of natural constituents nes, 910 Ice cream and edible ice, 917 Casein, 951 Milk, whole fresh buffalo, 954 Milk, skimmed buffalo, 955 Cheese, buffalo milk, 982 Milk, whole fresh sheep, 984 Cheese, sheep milk, 985 Milk, skimmed sheep, 1020 Milk, whole fresh goat, 1021 Cheese of goat mlk, 1023 Milk, skimmed goat, 1130 Milk, whole fresh camel 131 2015 00002848 0645pc Default composition: 882 Milk, whole fresh cow, 888 Milk, skimmed cow, 889 Milk, whole condensed, 890 Whey, condensed, 891 Yoghurt, 892 Yoghurt, concentrated or not, 893 Buttermilk, curdled, acidified milk, 894 Milk, whole evaporated, 895 Milk, skimmed evaporated, 896 Milk, skimmed condensed, 897 Milk, whole dried, 898 Milk, skimmed dried, 899 Milk, dry buttermilk, 900 Whey, dry, 901 Cheese, whole cow milk, 903 Whey, fresh, 904 Cheese, skimmed cow milk, 905 Whey, cheese, 907 Cheese, processed, 908 Milk, reconstituted, 909 Milk, products of natural constituents nes, 910 Ice cream and edible ice, 917 Casein, 951 Milk, whole fresh buffalo, 954 Milk, skimmed buffalo, 955 Cheese, buffalo milk, 982 Milk, whole fresh sheep, 984 Cheese, sheep milk, 985 Milk, skimmed sheep, 1020 Milk, whole fresh goat, 1021 Cheese of goat mlk, 1023 Milk, skimmed goat, 1130 Milk, whole fresh camel 131 2017 00002848 000645 Default composition: 882 Milk, whole fresh cow, 888 Milk, skimmed cow, 889 Milk, whole condensed, 890 Whey, condensed, 891 Yoghurt, 892 Yoghurt, concentrated or not, 893 Buttermilk, curdled, acidified milk, 894 Milk, whole evaporated, 895 Milk, skimmed evaporated, 896 Milk, skimmed condensed, 897 Milk, whole dried, 898 Milk, skimmed dried, 899 Milk, dry buttermilk, 900 Whey, dry, 901 Cheese, whole cow milk, 903 Whey, fresh, 904 Cheese, skimmed cow milk, 905 Whey, cheese, 907 Cheese, processed, 908 Milk, reconstituted, 909 Milk, products of natural constituents nes, 910 Ice cream and edible ice, 917 Casein, 951 Milk, whole fresh buffalo, 954 Milk, skimmed buffalo, 955 Cheese, buffalo milk, 982 Milk, whole fresh sheep, 984 Cheese, sheep milk, 985 Milk, skimmed sheep, 1020 Milk, whole fresh goat, 1021 Cheese of goat mlk, 1023 Milk, skimmed goat, 1130 Milk, whole fresh camel 131 2020 00002848 000645 Default composition: 882 Milk, whole fresh cow, 888 Milk, skimmed cow, 889 Milk, whole condensed, 890 Whey, condensed, 891 Yoghurt, 892 Yoghurt, concentrated or not, 893 Buttermilk, curdled, acidified milk, 894 Milk, whole evaporated, 895 Milk, skimmed evaporated, 896 Milk, skimmed condensed, 897 Milk, whole dried, 898 Milk, skimmed dried, 899 Milk, dry buttermilk, 900 Whey, dry, 901 Cheese, whole cow milk, 903 Whey, fresh, 904 Cheese, skimmed cow milk, 905 Whey, cheese, 907 Cheese, processed, 908 Milk, reconstituted, 909 Milk, products of natural constituents nes, 910 Ice cream and edible ice, 917 Casein, 951 Milk, whole fresh buffalo, 954 Milk, skimmed buffalo, 955 Cheese, buffalo milk, 982 Milk, whole fresh sheep, 984 Cheese, sheep milk, 985 Milk, skimmed sheep, 1020 Milk, whole fresh goat, 1021 Cheese of goat mlk, 1023 Milk, skimmed goat, 1130 Milk, whole fresh camel 131 2021 00002848 000645 Default composition: 882 Milk, whole fresh cow, 888 Milk, skimmed cow, 889 Milk, whole condensed, 890 Whey, condensed, 891 Yoghurt, 892 Yoghurt, concentrated or not, 893 Buttermilk, curdled, acidified milk, 894 Milk, whole evaporated, 895 Milk, skimmed evaporated, 896 Milk, skimmed condensed, 897 Milk, whole dried, 898 Milk, skimmed dried, 899 Milk, dry buttermilk, 900 Whey, dry, 901 Cheese, whole cow milk, 903 Whey, fresh, 904 Cheese, skimmed cow milk, 905 Whey, cheese, 907 Cheese, processed, 908 Milk, reconstituted, 909 Milk, products of natural constituents nes, 910 Ice cream and edible ice, 917 Casein, 951 Milk, whole fresh buffalo, 954 Milk, skimmed buffalo, 955 Cheese, buffalo milk, 982 Milk, whole fresh sheep, 984 Cheese, sheep milk, 985 Milk, skimmed sheep, 1020 Milk, whole fresh goat, 1021 Cheese of goat mlk, 1023 Milk, skimmed goat, 1130 Milk, whole fresh camel ~ Column population_with_data (changed data) - - Removed values: 24 / 27091454 (0.00%) area_code year item_code element_code population_with_data 131 2015 00002848 000645 31068834 131 2015 00002848 0645pc 31068834 131 2017 00002848 000645 31975812 131 2020 00002848 000645 33199988 131 2021 00002848 000645 33573872 ~ Column unit (changed data) - - Removed values: 24 / 27091454 (0.00%) area_code year item_code element_code unit 131 2015 00002848 000645 kilograms per year 131 2015 00002848 0645pc kilograms per year per capita 131 2017 00002848 000645 kilograms per year 131 2020 00002848 000645 kilograms per year 131 2021 00002848 000645 kilograms per year ~ Column unit_short_name (changed data) - - Removed values: 24 / 27091454 (0.00%) area_code year item_code element_code unit_short_name 131 2015 00002848 000645 kg 131 2015 00002848 0645pc kg 131 2017 00002848 000645 kg 131 2020 00002848 000645 kg 131 2021 00002848 000645 kg ~ Column value (changed data) - - Removed values: 24 / 27091454 (0.00%) area_code year item_code element_code value 131 2015 00002848 000645 2.308538e+10 131 2015 00002848 0645pc 7.430399e+02 131 2017 00002848 000645 3.057847e+10 131 2020 00002848 000645 3.448084e+10 131 2021 00002848 000645 4.019866e+10 = Dataset garden/faostat/2024-03-14/faostat_food_explorer = Table faostat_food_explorer ~ Dataset garden/faostat/2024-03-14/faostat_qcl + + +The number of poultry birds in Europe, the EU, and High-income countries, has spurious jumps on recent years. The reason is that some European countries (a least Germany, Italy and Spain) lack data on those years. The odd thing is that Europe (FAO) and European Union (27) (FAO) do have data for those years, even though the member countries don't. So, the ideal solution is to replace those years with the (FAO) data. + + = Table faostat_qcl ~ Dim area_code - - Removed values: 4 / 6907590 (0.00%) year item_code element_code area_code 2018 00002029 005112 OWID_HIC 2019 00002029 005112 OWID_HIC 2021 00002029 005112 OWID_HIC 2022 00002029 005112 OWID_HIC ~ Dim year - - Removed values: 4 / 6907590 (0.00%) area_code item_code element_code year OWID_HIC 00002029 005112 2018 OWID_HIC 00002029 005112 2019 OWID_HIC 00002029 005112 2021 OWID_HIC 00002029 005112 2022 ~ Dim item_code - - Removed values: 4 / 6907590 (0.00%) area_code year element_code item_code OWID_HIC 2018 005112 00002029 OWID_HIC 2019 005112 00002029 OWID_HIC 2021 005112 00002029 OWID_HIC 2022 005112 00002029 ~ Dim element_code - - Removed values: 4 / 6907590 (0.00%) area_code year item_code element_code OWID_HIC 2018 00002029 005112 OWID_HIC 2019 00002029 005112 OWID_HIC 2021 00002029 005112 OWID_HIC 2022 00002029 005112 ~ Column country (changed data) - - Removed values: 4 / 6907590 (0.00%) area_code year item_code element_code country OWID_HIC 2018 00002029 005112 High-income countries OWID_HIC 2019 00002029 005112 High-income countries OWID_HIC 2021 00002029 005112 High-income countries OWID_HIC 2022 00002029 005112 High-income countries ~ Column element (changed data) - - Removed values: 4 / 6907590 (0.00%) area_code year item_code element_code element OWID_HIC 2018 00002029 005112 Stocks OWID_HIC 2019 00002029 005112 Stocks OWID_HIC 2021 00002029 005112 Stocks OWID_HIC 2022 00002029 005112 Stocks ~ Column element_description (changed data) - - Removed values: 4 / 6907590 (0.00%) area_code year item_code element_code element_description OWID_HIC 2018 00002029 005112 This variable indicates the number of animals of the species present in the country at the time of enumeration. It includes animals raised either for draft purposes or for meat, eggs and dairy production or kept for breeding. Live animals in captivity for fur or skin such as foxes, minks etc. are not included in the system although furskin trade is reported. The enumeration to be chosen, when more than one survey is taken, is the closest to the beginning of the calendar year. Livestock data are reported in number of heads (units) except for poultry, rabbits and other rodents which are reported in thousand units. Source: FAO Statistics Division OWID_HIC 2019 00002029 005112 This variable indicates the number of animals of the species present in the country at the time of enumeration. It includes animals raised either for draft purposes or for meat, eggs and dairy production or kept for breeding. Live animals in captivity for fur or skin such as foxes, minks etc. are not included in the system although furskin trade is reported. The enumeration to be chosen, when more than one survey is taken, is the closest to the beginning of the calendar year. Livestock data are reported in number of heads (units) except for poultry, rabbits and other rodents which are reported in thousand units. Source: FAO Statistics Division OWID_HIC 2021 00002029 005112 This variable indicates the number of animals of the species present in the country at the time of enumeration. It includes animals raised either for draft purposes or for meat, eggs and dairy production or kept for breeding. Live animals in captivity for fur or skin such as foxes, minks etc. are not included in the system although furskin trade is reported. The enumeration to be chosen, when more than one survey is taken, is the closest to the beginning of the calendar year. Livestock data are reported in number of heads (units) except for poultry, rabbits and other rodents which are reported in thousand units. Source: FAO Statistics Division OWID_HIC 2022 00002029 005112 This variable indicates the number of animals of the species present in the country at the time of enumeration. It includes animals raised either for draft purposes or for meat, eggs and dairy production or kept for breeding. Live animals in captivity for fur or skin such as foxes, minks etc. are not included in the system although furskin trade is reported. The enumeration to be chosen, when more than one survey is taken, is the closest to the beginning of the calendar year. Livestock data are reported in number of heads (units) except for poultry, rabbits and other rodents which are reported in thousand units. Source: FAO Statistics Division ~ Column fao_element (changed data) - - Removed values: 4 / 6907590 (0.00%) area_code year item_code element_code fao_element OWID_HIC 2018 00002029 005112 Stocks OWID_HIC 2019 00002029 005112 Stocks OWID_HIC 2021 00002029 005112 Stocks OWID_HIC 2022 00002029 005112 Stocks ~ Column fao_item (changed data) - - Removed values: 4 / 6907590 (0.00%) area_code year item_code element_code fao_item OWID_HIC 2018 00002029 005112 Poultry Birds OWID_HIC 2019 00002029 005112 Poultry Birds OWID_HIC 2021 00002029 005112 Poultry Birds OWID_HIC 2022 00002029 005112 Poultry Birds ~ Column fao_unit_short_name (changed data) - - Removed values: 4 / 6907590 (0.00%) area_code year item_code element_code fao_unit_short_name OWID_HIC 2018 00002029 005112 1000 An OWID_HIC 2019 00002029 005112 1000 An OWID_HIC 2021 00002029 005112 1000 An OWID_HIC 2022 00002029 005112 1000 An ~ Column flag (changed data) - - Removed values: 4 / 6907590 (0.00%) area_code year item_code element_code flag OWID_HIC 2018 00002029 005112 multiple_flags OWID_HIC 2019 00002029 005112 multiple_flags OWID_HIC 2021 00002029 005112 multiple_flags OWID_HIC 2022 00002029 005112 multiple_flags ~ Column item (changed data) - - Removed values: 4 / 6907590 (0.00%) area_code year item_code element_code item OWID_HIC 2018 00002029 005112 Poultry OWID_HIC 2019 00002029 005112 Poultry OWID_HIC 2021 00002029 005112 Poultry OWID_HIC 2022 00002029 005112 Poultry ~ Column item_description (changed data) - - Removed values: 4 / 6907590 (0.00%) area_code year item_code element_code item_description OWID_HIC 2018 00002029 005112 OWID_HIC 2019 00002029 005112 OWID_HIC 2021 00002029 005112 OWID_HIC 2022 00002029 005112 ~ Column population_with_data (changed data) - - Removed values: 4 / 6907590 (0.00%) area_code year item_code element_code population_with_data OWID_HIC 2018 00002029 005112 972364102 OWID_HIC 2019 00002029 005112 966890483 OWID_HIC 2021 00002029 005112 960849042 OWID_HIC 2022 00002029 005112 947328234 ~ Column unit (changed data) - - Removed values: 4 / 6907590 (0.00%) area_code year item_code element_code unit OWID_HIC 2018 00002029 005112 animals OWID_HIC 2019 00002029 005112 animals OWID_HIC 2021 00002029 005112 animals OWID_HIC 2022 00002029 005112 animals ~ Column unit_short_name (changed data) - - Removed values: 4 / 6907590 (0.00%) area_code year item_code element_code unit_short_name OWID_HIC 2018 00002029 005112 animals OWID_HIC 2019 00002029 005112 animals OWID_HIC 2021 00002029 005112 animals OWID_HIC 2022 00002029 005112 animals ~ Column value (changed data) - - Removed values: 4 / 6907590 (0.00%) area_code year item_code element_code value OWID_HIC 2018 00002029 005112 4092026112.0 OWID_HIC 2019 00002029 005112 4126217984.0 OWID_HIC 2021 00002029 005112 4048410112.0 OWID_HIC 2022 00002029 005112 3949990912.0 ~ Changed values: 8 / 6907590 (0.00%) area_code year item_code element_code value - value + OWID_EU27 2019 00002029 005112 757353024.0 1579394048.0 OWID_EU27 2021 00002029 005112 682350976.0 1568039936.0 OWID_EU27 2022 00002029 005112 583499008.0 1574957952.0 OWID_EUR 2021 00002029 005112 1790632960.0 2671702016.0 OWID_EUR 2022 00002029 005112 1712675968.0 2699513088.0 = Table faostat_qcl_flat ~ Column poultry__00002029__stocks__005112__animals (changed data) ~ Changed values: 12 / 14464 (0.08%) country year poultry__00002029__stocks__005112__animals - poultry__00002029__stocks__005112__animals + Europe 2021 1790633000 2671702000 European Union (27) 2018 789960000 1563728000 European Union (27) 2021 682351000 1568040000 High-income countries 2021 4048410000 High-income countries 2022 3949991000 Legend: +New ~Modified -Removed =Identical Details Hint: Run this locally with etl diff REMOTE data/ --include yourdataset --verbose --snippet ``` Automatically updated datasets matching _weekly_wildfires|excess_mortality|covid|fluid|flunet|country_profile|garden/ihme_gbd/2019/gbd_risk_ are not included

Edited: 2024-07-02 12:31:46 UTC Execution time: 253.64 seconds