primap-community / unfccc_di_api

Python wrapper around the Flexible Query API of the UNFCCC.
https://unfccc-di-api.readthedocs.io
Other
8 stars 1 forks source link

Properly deal with duplicate variableIds #28

Closed JGuetschow closed 3 years ago

JGuetschow commented 3 years ago

Description

When using the _annex_onereader of _unfccc_diapi.UNFCCCApiReader with a category filter for 9559 (Enteric Fermentation) the query result also contains data for Manure Management (9608). Querying for Manure management only returns data for manure management.

What I Did

Here's a minimal code example to reproduce the problem

import unfccc_di_api

reader = unfccc_di_api.UNFCCCApiReader()
data_new = reader.annex_one_reader.query(party_codes=['DEU'], category_ids=[9559])
print(data_new["category"].unique())

Actual output is

['3.A  Enteric Fermentation' '3.B  Manure Management']

Expected output is

['3.A  Enteric Fermentation']
mikapfl commented 3 years ago

The problem is that some variables exist twice in the API, with both categories. An example is:

[{'variableId': 957732, 'categoryId': 9559, 'classificationId': 10650, 'measureId': 10596, 'gasId': 10637, 'unitId': 175}
 {'variableId': 957732, 'categoryId': 9608, 'classificationId': 10650, 'measureId': 10596, 'gasId': 10637, 'unitId': 175}]

That is of course very confusing. What we do at the moment is to discard all but the first variable. That's probably not correct, but it is impossible to decide which one is correct. Additionally, there is a bug where we sometimes use the last instead of the first variable, which is why you noticed this.

I am not sure how to solve this. Strictly speaking, we should probably discard all data with ambiguous variables because it is simply impossible to know which of the two variables is correct (the data points only specify a variable, and in the example above, it is impossible to say if data with the variableId 957732 belongs to category 3.A or 3.B). Or do you see another possibility to deal with this?

mikapfl commented 3 years ago

We could also try to find out what the official web query thing does with these duplicate variables, and be bug-compatible with them.

JGuetschow commented 3 years ago

In my case the duplicate data is actually fine. It's data for the number of livestock and it's just logical that the same number of cows is reported in enteric fermentation and manure management. So I actually do get the correct data just for the wrong category. So I think it's actually fine if the data belongs to both categories. And in my query case, for me the preferred solution would be to filter the duplicate data for category_id and keep the data that matches the filter (also for measure and classification)

JGuetschow commented 3 years ago

The web interface just gives you the category you filtered for. No data for the same variable for other categories.

mikapfl commented 3 years ago

The web interface just gives you the category you filtered for. No data for the same variable for other categories.

But does it have the same information or is it dropping the duplicate variables one way or another? I mean, I can fix the bug that I found and then it will be one or the other category, but not both - but maybe the incorrect category.

JGuetschow commented 3 years ago

I think I don't understand the problem. When I set a category in the query, why can't you use the datapoint for the category given in the query?

mikapfl commented 3 years ago

The data query in the API works like this:

The root of the problem is now that in the list of variables, there are different variables (with different categories) with the same variableId. In these cases, it is therefore impossible to know if for a given data point one or the other or both categories apply.

A secondary problem is that when filtering the variables for the correct category, we use another information source than when filling in the info in the last step. That's what is causing the issue here - actually, all the data you are getting has at least one variable with the 3.A category, and some also have a variable with the 3.B category, and for some of those with double categories, the parsing code chooses the 3.B category. On the other hand, if you request the 3.B category, the filtering code likely already filters out all the variables with two categories because it only considers the first category and 3.B comes after 3.A, so that the result will be missing some data points, but never has 3.A as a result.

I would like to solve the first problem (finding out what it actually means if two variables with identical ID and different categories exist), and return proper data always, instead of only solving the secondary problem (consistently discard all but the first variable with the same ID).

I'll make a list with duplicate variableIds and what they mean. Maybe all of these cases are like the 3.A/3.B case where the data is legitimately the same thing (likely, activity data) which can and should be supplied for multiple categories.

mikapfl commented 3 years ago

These are all duplicate variables:

category classification measure gas unit
4.E.2.d Wetlands Converted to Settlements (8465) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
4.E.2.d Wetlands Converted to Settlements (8465) Carbon stock change (10827) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
2. Industrial Processes and Product Use (10393) Total for category (10510) Net emissions/removals (10460) Aggregate F-gases (10466) t CO₂ equivalent (96)
2. Industrial Processes and Product Use (10482) Total for category (10510) Net emissions/removals (10460) Aggregate F-gases (10466) t CO₂ equivalent (96)
category classification measure gas unit
4.F.2.c Grassland Converted to Other Land (8574) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
4.F.2.c Grassland Converted to Other Land (8574) Carbon stock change (10827) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
3. Agriculture (10483) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
3. Agriculture (10096) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
4.B.2.d Settlements Converted to Cropland (8709) Total for category (10510) Net emissions/removals (10460) N₂O (10471) kt (5)
4.B.2.d Settlements Converted to Cropland (8709) 4 (III) Direct N2O Emissions from N Mineralization/ Immobilization (10820) Net emissions/removals (10460) N₂O (10471) kt (5)
category classification measure gas unit
6. Other (10476) Total for category (10510) Net emissions/removals (10460) N₂O (10471) kt (5)
6. Other (10485) Total for category (10510) Net emissions/removals (10460) N₂O (10471) kt (5)
category classification measure gas unit
4.A.2.b Grassland Converted to Forest Land (9851) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
4.A.2.b Grassland Converted to Forest Land (9851) Carbon stock change (10827) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
4.A.2.e Other Land Converted to Forest Land (8735) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
4.A.2.e Other Land Converted to Forest Land (8735) Carbon stock change (10827) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
6. Other (10476) Total for category (10510) Net emissions/removals (10460) PFCs (10473) kt CO₂ equivalent (140)
6. Other (10485) Total for category (10510) Net emissions/removals (10460) PFCs (10473) kt CO₂ equivalent (140)
category classification measure gas unit
4.A.2.d Settlements Converted to Forest Land (10297) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
4.A.2.d Settlements Converted to Forest Land (10297) Carbon stock change (10827) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
4.E.2.a Forest Land Converted to Settlements (10183) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
4.E.2.a Forest Land Converted to Settlements (10183) Carbon stock change (10827) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
4.E.2.b Cropland Converted to Settlements (9914) Total for category (10510) Net emissions/removals (10460) N₂O (10471) kt (5)
4.E.2.b Cropland Converted to Settlements (9914) 4 (III) Direct N2O Emissions from N Mineralization/ Immobilization (10820) Net emissions/removals (10460) N₂O (10471) kt (5)
category classification measure gas unit
4.B.2.e Other Land Converted to Cropland (9560) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
4.B.2.e Other Land Converted to Cropland (9560) Carbon stock change (10827) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
4.C.2.a Forest Land Converted to Grassland (9405) Total for category (10510) Net emissions/removals (10460) N₂O (10471) kt (5)
4.C.2.a Forest Land Converted to Grassland (9405) 4 (III) Direct N2O Emissions from N Mineralization/ Immobilization (10820) Net emissions/removals (10460) N₂O (10471) kt (5)
category classification measure gas unit
4.C.2.c Wetlands Converted to Grassland (10323) Total for category (10510) Net emissions/removals (10460) N₂O (10471) kt (5)
4.C.2.c Wetlands Converted to Grassland (10323) 4 (III) Direct N2O Emissions from N Mineralization/ Immobilization (10820) Net emissions/removals (10460) N₂O (10471) kt (5)
category classification measure gas unit
6. Other (10476) Total for category (10510) Net emissions/removals (10460) Aggregate GHGs (10467) kt CO₂ equivalent (140)
6. Other (10485) Total for category (10510) Net emissions/removals (10460) Aggregate GHGs (10467) kt CO₂ equivalent (140)
category classification measure gas unit
4.A.2.a Cropland Converted to Forest Land (9741) Total for category (10510) Net emissions/removals (10460) N₂O (10471) kt (5)
4.A.2.a Cropland Converted to Forest Land (9741) 4 (III) Direct N2O Emissions from N Mineralization/ Immobilization (10820) Net emissions/removals (10460) N₂O (10471) kt (5)
category classification measure gas unit
2. Industrial Processes and Product Use (10393) Total for category (10510) Net emissions/removals (10460) PFCs (10473) t CO₂ equivalent (96)
2. Industrial Processes and Product Use (10482) Total for category (10510) Net emissions/removals (10460) PFCs (10473) t CO₂ equivalent (96)
category classification measure gas unit
6. Other (10485) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
6. Other (10476) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
4.C.2.d Settlements Converted to Grassland (9444) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
4.C.2.d Settlements Converted to Grassland (9444) Carbon stock change (10827) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
4.C.2.e Other Land Converted to Grassland (10189) Total for category (10510) Net emissions/removals (10460) N₂O (10471) kt (5)
4.C.2.e Other Land Converted to Grassland (10189) 4 (III) Direct N2O Emissions from N Mineralization/ Immobilization (10820) Net emissions/removals (10460) N₂O (10471) kt (5)
category classification measure gas unit
4.A.2.c Wetlands Converted to Forest Land (9306) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
4.A.2.c Wetlands Converted to Forest Land (9306) Carbon stock change (10827) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
5. Waste (10159) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
5. Waste (10484) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
1. Energy (8819) Total for category (10510) Net emissions/removals (10460) N₂O (10471) kt (5)
1. Energy (10481) Total for category (10510) Net emissions/removals (10460) N₂O (10471) kt (5)
category classification measure gas unit
5. Waste (10159) Total for category (10510) Net emissions/removals (10460) CH₄ (10468) kt (5)
5. Waste (10484) Total for category (10510) Net emissions/removals (10460) CH₄ (10468) kt (5)
category classification measure gas unit
4.B.2.b Grassland Converted to Cropland (9491) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
4.B.2.b Grassland Converted to Cropland (9491) Carbon stock change (10827) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
CO₂ Emissions from Biomass (8270) Biomass (10513) Net emissions/removals (10460) CO₂ (10469) kt (5)
1.AA Fuel Combustion - Sectoral approach (9089) Biomass (10513) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
3. Agriculture (10483) Total for category (10510) Net emissions/removals (10460) N₂O (10471) kt (5)
3. Agriculture (10096) Total for category (10510) Net emissions/removals (10460) N₂O (10471) kt (5)
category classification measure gas unit
4.B.2.c Wetlands Converted to Cropland (10450) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
4.B.2.c Wetlands Converted to Cropland (10450) Carbon stock change (10827) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
2. Industrial Processes and Product Use (10393) Total for category (10510) Net emissions/removals (10460) HFCs (10470) t CO₂ equivalent (96)
2. Industrial Processes and Product Use (10482) Total for category (10510) Net emissions/removals (10460) HFCs (10470) t CO₂ equivalent (96)
category classification measure gas unit
4.A.2.a Cropland Converted to Forest Land (9741) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
4.A.2.a Cropland Converted to Forest Land (9741) Carbon stock change (10827) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
4.D.1.a Peat Extraction Remaining Peat Extraction (10156) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
4.D.1.a Peat Extraction Remaining Peat Extraction (10156) Carbon stock change (10827) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
4.F.2.a Forest Land Converted to Other Land (8488) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
4.F.2.a Forest Land Converted to Other Land (8488) Carbon stock change (10827) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
4.A.2.b Grassland Converted to Forest Land (9851) Total for category (10510) Net emissions/removals (10460) N₂O (10471) kt (5)
4.A.2.b Grassland Converted to Forest Land (9851) 4 (III) Direct N2O Emissions from N Mineralization/ Immobilization (10820) Net emissions/removals (10460) N₂O (10471) kt (5)
category classification measure gas unit
6. Other (10476) Total for category (10510) Net emissions/removals (10460) Unspecified mix of HFCs and PFCs (10475) kt CO₂ equivalent (140)
6. Other (10485) Total for category (10510) Net emissions/removals (10460) Unspecified mix of HFCs and PFCs (10475) kt CO₂ equivalent (140)
category classification measure gas unit
4.E.2.c Grassland Converted to Settlements (10026) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
4.E.2.c Grassland Converted to Settlements (10026) Carbon stock change (10827) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
4.F.2.d Wetlands Converted to Other Land (10371) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
4.F.2.d Wetlands Converted to Other Land (10371) Carbon stock change (10827) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
6. Other (10476) Total for category (10510) Net emissions/removals (10460) HFCs (10470) kt CO₂ equivalent (140)
6. Other (10485) Total for category (10510) Net emissions/removals (10460) HFCs (10470) kt CO₂ equivalent (140)
category classification measure gas unit
4.E.2.c Grassland Converted to Settlements (10026) Total for category (10510) Net emissions/removals (10460) N₂O (10471) kt (5)
4.E.2.c Grassland Converted to Settlements (10026) 4 (III) Direct N2O Emissions from N Mineralization/ Immobilization (10820) Net emissions/removals (10460) N₂O (10471) kt (5)
category classification measure gas unit
4.F.2.e Settlements Converted to Other Land (9019) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
4.F.2.e Settlements Converted to Other Land (9019) Carbon stock change (10827) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
4.B.2.a Forest Land Converted to Cropland (9799) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
4.B.2.a Forest Land Converted to Cropland (9799) Carbon stock change (10827) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
1. Energy (8819) Total for category (10510) Net emissions/removals (10460) CH₄ (10468) kt (5)
1. Energy (10481) Total for category (10510) Net emissions/removals (10460) CH₄ (10468) kt (5)
category classification measure gas unit
2. Industrial Processes and Product Use (10393) Total for category (10510) Net emissions/removals (10460) NF₃ (10472) kt (5)
2. Industrial Processes and Product Use (10482) Total for category (10510) Net emissions/removals (10460) NF₃ (10472) kt (5)
category classification measure gas unit
4.D.1.b Flooded Land Remaining Flooded Land (9151) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
4.D.1.b Flooded Land Remaining Flooded Land (9151) Carbon stock change (10827) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
5. Waste (10159) Total for category (10510) Net emissions/removals (10460) Aggregate GHGs (10467) kt CO₂ equivalent (140)
5. Waste (10484) Total for category (10510) Net emissions/removals (10460) Aggregate GHGs (10467) kt CO₂ equivalent (140)
category classification measure gas unit
4.B.2.d Settlements Converted to Cropland (8709) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
4.B.2.d Settlements Converted to Cropland (8709) Carbon stock change (10827) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
2. Industrial Processes and Product Use (10393) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
2. Industrial Processes and Product Use (10482) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
6. Other (10476) Total for category (10510) Net emissions/removals (10460) Aggregate F-gases (10466) t CO₂ equivalent (96)
6. Other (10485) Total for category (10510) Net emissions/removals (10460) Aggregate F-gases (10466) t CO₂ equivalent (96)
category classification measure gas unit
3. Agriculture (10483) Total for category (10510) Net emissions/removals (10460) Aggregate GHGs (10467) kt CO₂ equivalent (140)
3. Agriculture (10096) Total for category (10510) Net emissions/removals (10460) Aggregate GHGs (10467) kt CO₂ equivalent (140)
category classification measure gas unit
4.E.2.e Other Land Converted to Settlements (9857) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
4.E.2.e Other Land Converted to Settlements (9857) Carbon stock change (10827) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
4.B.2.c Wetlands Converted to Cropland (10450) Total for category (10510) Net emissions/removals (10460) N₂O (10471) kt (5)
4.B.2.c Wetlands Converted to Cropland (10450) 4 (III) Direct N2O Emissions from N Mineralization/ Immobilization (10820) Net emissions/removals (10460) N₂O (10471) kt (5)
category classification measure gas unit
4.A.2.c Wetlands Converted to Forest Land (9306) Total for category (10510) Net emissions/removals (10460) N₂O (10471) kt (5)
4.A.2.c Wetlands Converted to Forest Land (9306) 4 (III) Direct N2O Emissions from N Mineralization/ Immobilization (10820) Net emissions/removals (10460) N₂O (10471) kt (5)
category classification measure gas unit
4.C.2.b Cropland Converted to Grassland (9966) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
4.C.2.b Cropland Converted to Grassland (9966) Carbon stock change (10827) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
2. Industrial Processes and Product Use (10393) Total for category (10510) Net emissions/removals (10460) N₂O (10471) kt (5)
2. Industrial Processes and Product Use (10482) Total for category (10510) Net emissions/removals (10460) N₂O (10471) kt (5)
category classification measure gas unit
6. Other (10476) Total for category (10510) Net emissions/removals (10460) CH₄ (10468) kt (5)
6. Other (10485) Total for category (10510) Net emissions/removals (10460) CH₄ (10468) kt (5)
category classification measure gas unit
4.C.2.e Other Land Converted to Grassland (10189) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
4.C.2.e Other Land Converted to Grassland (10189) Carbon stock change (10827) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
4.B.2.e Other Land Converted to Cropland (9560) Total for category (10510) Net emissions/removals (10460) N₂O (10471) kt (5)
4.B.2.e Other Land Converted to Cropland (9560) 4 (III) Direct N2O Emissions from N Mineralization/ Immobilization (10820) Net emissions/removals (10460) N₂O (10471) kt (5)
category classification measure gas unit
2. Industrial Processes and Product Use (10393) Total for category (10510) Net emissions/removals (10460) SF₆ (10474) kt (5)
2. Industrial Processes and Product Use (10482) Total for category (10510) Net emissions/removals (10460) SF₆ (10474) kt (5)
category classification measure gas unit
4.A.2.d Settlements Converted to Forest Land (10297) Total for category (10510) Net emissions/removals (10460) N₂O (10471) kt (5)
4.A.2.d Settlements Converted to Forest Land (10297) 4 (III) Direct N2O Emissions from N Mineralization/ Immobilization (10820) Net emissions/removals (10460) N₂O (10471) kt (5)
category classification measure gas unit
1. Energy (8819) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
1. Energy (10481) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
2. Industrial Processes and Product Use (10393) Total for category (10510) Net emissions/removals (10460) CH₄ (10468) kt (5)
2. Industrial Processes and Product Use (10482) Total for category (10510) Net emissions/removals (10460) CH₄ (10468) kt (5)
category classification measure gas unit
4.C.2.c Wetlands Converted to Grassland (10323) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
4.C.2.c Wetlands Converted to Grassland (10323) Carbon stock change (10827) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
4.E.2.a Forest Land Converted to Settlements (10183) Total for category (10510) Net emissions/removals (10460) N₂O (10471) kt (5)
4.E.2.a Forest Land Converted to Settlements (10183) 4 (III) Direct N2O Emissions from N Mineralization/ Immobilization (10820) Net emissions/removals (10460) N₂O (10471) kt (5)
category classification measure gas unit
2. Industrial Processes and Product Use (10393) Total for category (10510) Net emissions/removals (10460) Aggregate GHGs (10467) kt CO₂ equivalent (140)
2. Industrial Processes and Product Use (10482) Total for category (10510) Net emissions/removals (10460) Aggregate GHGs (10467) kt CO₂ equivalent (140)
category classification measure gas unit
4.A.2.e Other Land Converted to Forest Land (8735) Total for category (10510) Net emissions/removals (10460) N₂O (10471) kt (5)
4.A.2.e Other Land Converted to Forest Land (8735) 4 (III) Direct N2O Emissions from N Mineralization/ Immobilization (10820) Net emissions/removals (10460) N₂O (10471) kt (5)
category classification measure gas unit
3. Agriculture (10483) Total for category (10510) Net emissions/removals (10460) CH₄ (10468) kt (5)
3. Agriculture (10096) Total for category (10510) Net emissions/removals (10460) CH₄ (10468) kt (5)
category classification measure gas unit
4.E.2.b Cropland Converted to Settlements (9914) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
4.E.2.b Cropland Converted to Settlements (9914) Carbon stock change (10827) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
4.D.1.c Other Wetlands Remaining Other Wetlands (8307) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
4.D.1.c Other Wetlands Remaining Other Wetlands (8307) Carbon stock change (10827) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
2. Industrial Processes and Product Use (10393) Total for category (10510) Net emissions/removals (10460) Unspecified mix of HFCs and PFCs (10475) t CO₂ equivalent (96)
2. Industrial Processes and Product Use (10482) Total for category (10510) Net emissions/removals (10460) Unspecified mix of HFCs and PFCs (10475) t CO₂ equivalent (96)
category classification measure gas unit
6. Other (10476) Total for category (10510) Net emissions/removals (10460) SF₆ (10474) kt (5)
6. Other (10485) Total for category (10510) Net emissions/removals (10460) SF₆ (10474) kt (5)
category classification measure gas unit
4.E.2.e Other Land Converted to Settlements (9857) Total for category (10510) Net emissions/removals (10460) N₂O (10471) kt (5)
4.E.2.e Other Land Converted to Settlements (9857) 4 (III) Direct N2O Emissions from N Mineralization/ Immobilization (10820) Net emissions/removals (10460) N₂O (10471) kt (5)
category classification measure gas unit
4.F.2.b Cropland Converted to Other Land (8234) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
4.F.2.b Cropland Converted to Other Land (8234) Carbon stock change (10827) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
4.B.2.b Grassland Converted to Cropland (9491) Total for category (10510) Net emissions/removals (10460) N₂O (10471) kt (5)
4.B.2.b Grassland Converted to Cropland (9491) 4 (III) Direct N2O Emissions from N Mineralization/ Immobilization (10820) Net emissions/removals (10460) N₂O (10471) kt (5)
category classification measure gas unit
5. Waste (10159) Total for category (10510) Net emissions/removals (10460) N₂O (10471) kt (5)
5. Waste (10484) Total for category (10510) Net emissions/removals (10460) N₂O (10471) kt (5)
category classification measure gas unit
4.E.2.d Wetlands Converted to Settlements (8465) Total for category (10510) Net emissions/removals (10460) N₂O (10471) kt (5)
4.E.2.d Wetlands Converted to Settlements (8465) 4 (III) Direct N2O Emissions from N Mineralization/ Immobilization (10820) Net emissions/removals (10460) N₂O (10471) kt (5)
category classification measure gas unit
6. Other (10476) Total for category (10510) Net emissions/removals (10460) NF₃ (10472) kt (5)
6. Other (10485) Total for category (10510) Net emissions/removals (10460) NF₃ (10472) kt (5)
category classification measure gas unit
1. Energy (8819) Total for category (10510) Net emissions/removals (10460) Aggregate GHGs (10467) kt CO₂ equivalent (140)
1. Energy (10481) Total for category (10510) Net emissions/removals (10460) Aggregate GHGs (10467) kt CO₂ equivalent (140)
category classification measure gas unit
4.C.2.a Forest Land Converted to Grassland (9405) Total for category (10510) Net emissions/removals (10460) CO₂ (10469) kt (5)
4.C.2.a Forest Land Converted to Grassland (9405) Carbon stock change (10827) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
4.B.2.a Forest Land Converted to Cropland (9799) Total for category (10510) Net emissions/removals (10460) N₂O (10471) kt (5)
4.B.2.a Forest Land Converted to Cropland (9799) 4 (III) Direct N2O Emissions from N Mineralization/ Immobilization (10820) Net emissions/removals (10460) N₂O (10471) kt (5)
category classification measure gas unit
4.C.2.b Cropland Converted to Grassland (9966) Total for category (10510) Net emissions/removals (10460) N₂O (10471) kt (5)
4.C.2.b Cropland Converted to Grassland (9966) 4 (III) Direct N2O Emissions from N Mineralization/ Immobilization (10820) Net emissions/removals (10460) N₂O (10471) kt (5)
category classification measure gas unit
5. Waste (10159) Total for category (10510) Indirect emissions (10563) N₂O (10471) kt (5)
5. Waste (10484) Total for category (10510) Indirect emissions (10563) N₂O (10471) kt (5)
category classification measure gas unit
1. Energy (8819) Total for category (10510) Indirect emissions (10563) N₂O (10471) kt (5)
1. Energy (10481) Total for category (10510) Indirect emissions (10563) N₂O (10471) kt (5)
category classification measure gas unit
6. Other (10485) Total for category (10510) Indirect emissions (10563) N₂O (10471) kt (5)
6. Other (10476) Total for category (10510) Indirect emissions (10563) N₂O (10471) kt (5)
category classification measure gas unit
5.C.2 Open Burning of Waste (8943) Total for category (10510) Amount of wastes incinerated/open burned (10566) No gas (10637) kt (5)
5.C.2.b Non-biogenic (8910) Total for category (10510) Amount of wastes incinerated/open burned (10566) No gas (10637) kt (5)
category classification measure gas unit
2. Industrial Processes and Product Use (10393) Total for category (10510) Indirect emissions (10563) N₂O (10471) kt (5)
2. Industrial Processes and Product Use (10482) Total for category (10510) Indirect emissions (10563) N₂O (10471) kt (5)
category classification measure gas unit
3. Agriculture (10483) Total for category (10510) Indirect emissions (10563) N₂O (10471) kt (5)
3. Agriculture (10096) Total for category (10510) Indirect emissions (10563) N₂O (10471) kt (5)
category classification measure gas unit
3.A Enteric Fermentation (9559) Growing Cattle (10645) Population (10596) No gas (10637) 1000s (175)
3.B Manure Management (9608) Growing Cattle (10645) Population (10596) No gas (10637) 1000s (175)
category classification measure gas unit
3.A Enteric Fermentation (9559) Dairy Cattle (10641) Population (10596) No gas (10637) 1000s (175)
3.B Manure Management (9608) Dairy Cattle (10641) Population (10596) No gas (10637) 1000s (175)
category classification measure gas unit
3.A Enteric Fermentation (9559) Other Mature Cattle (10653) Population (10596) No gas (10637) 1000s (175)
3.B Manure Management (9608) Other Mature Cattle (10653) Population (10596) No gas (10637) 1000s (175)
category classification measure gas unit
3.A Enteric Fermentation (9559) Mature Dairy Cattle (10647) Population (10596) No gas (10637) 1000s (175)
3.B Manure Management (9608) Mature Dairy Cattle (10647) Population (10596) No gas (10637) 1000s (175)
category classification measure gas unit
3.A Enteric Fermentation (9559) Non-Dairy Cattle (10649) Population (10596) No gas (10637) 1000s (175)
3.B Manure Management (9608) Non-Dairy Cattle (10649) Population (10596) No gas (10637) 1000s (175)
category classification measure gas unit
4.C.2.d Settlements Converted to Grassland (9444) Total for category (10510) Net emissions/removals (10460) N₂O (10471) kt (5)
4.C.2.d Settlements Converted to Grassland (9444) 4 (III) Direct N2O Emissions from N Mineralization/ Immobilization (10820) Net emissions/removals (10460) N₂O (10471) kt (5)
category classification measure gas unit
3.A Enteric Fermentation (9559) Horses (10646) Population (10596) No gas (10637) 1000s (175)
3.B Manure Management (9608) Horses (10646) Population (10596) No gas (10637) 1000s (175)
category classification measure gas unit
3.A Enteric Fermentation (9559) Ostrich (10650) Population (10596) No gas (10637) 1000s (175)
3.B Manure Management (9608) Ostrich (10650) Population (10596) No gas (10637) 1000s (175)
category classification measure gas unit
3.A Enteric Fermentation (9559) Poultry (10654) Population (10596) No gas (10637) 1000s (175)
3.B Manure Management (9608) Poultry (10654) Population (10596) No gas (10637) 1000s (175)
category classification measure gas unit
3.A Enteric Fermentation (9559) Other (10651) Population (10596) No gas (10637) 1000s (175)
3.B Manure Management (9608) Other (10651) Population (10596) No gas (10637) 1000s (175)
category classification measure gas unit
3.A Enteric Fermentation (9559) Mules and Asses (10648) Population (10596) No gas (10637) 1000s (175)
3.B Manure Management (9608) Mules and Asses (10648) Population (10596) No gas (10637) 1000s (175)
category classification measure gas unit
3.A Enteric Fermentation (9559) Camels (10639) Population (10596) No gas (10637) 1000s (175)
3.B Manure Management (9608) Camels (10639) Population (10596) No gas (10637) 1000s (175)
category classification measure gas unit
3.A Enteric Fermentation (9559) Deer (10642) Population (10596) No gas (10637) 1000s (175)
3.B Manure Management (9608) Deer (10642) Population (10596) No gas (10637) 1000s (175)
category classification measure gas unit
3.A Enteric Fermentation (9559) Buffalo (10638) Population (10596) No gas (10637) 1000s (175)
3.B Manure Management (9608) Buffalo (10638) Population (10596) No gas (10637) 1000s (175)
category classification measure gas unit
3.A Enteric Fermentation (9559) Rabbit (10655) Population (10596) No gas (10637) 1000s (175)
3.B Manure Management (9608) Rabbit (10655) Population (10596) No gas (10637) 1000s (175)
category classification measure gas unit
3.A Enteric Fermentation (9559) Reindeer (10656) Population (10596) No gas (10637) 1000s (175)
3.B Manure Management (9608) Reindeer (10656) Population (10596) No gas (10637) 1000s (175)
category classification measure gas unit
3.A Enteric Fermentation (9559) Goats (10644) Population (10596) No gas (10637) 1000s (175)
3.B Manure Management (9608) Goats (10644) Population (10596) No gas (10637) 1000s (175)
category classification measure gas unit
3.A Enteric Fermentation (9559) Fur-bearing Animals (10643) Population (10596) No gas (10637) 1000s (175)
3.B Manure Management (9608) Fur-bearing Animals (10643) Population (10596) No gas (10637) 1000s (175)
JGuetschow commented 3 years ago

OK, I think I understand the problem now. So it seems that internally the database has independent tables for variable, category, etc and the table with the actual data just references the ID and can thus have arbitrary combinations of variable and sector. For emissions data there will be a many to one correspondence of variable to category so we can infer the category from the variable, but that is not the case for e.g. activity data where the same activity data (e.g. number of cows) might be used for several sectors. The UNFCCC interface seems to manage the duplicate variable thing fine and allows for further filtering of the results.

mikapfl commented 3 years ago

Yeah, I looked through the list, and I think we can distinguish two cases, logically:

the same category with different IDs

Some categories have several IDs, for whatever reason. An example is:

category classification measure gas unit
6. Other (10476) Total for category (10510) Net emissions/removals (10460) N₂O (10471) kt (5)
6. Other (10485) Total for category (10510) Net emissions/removals (10460) N₂O (10471) kt (5)

Here, there is no ambiguity.

the same data, different categories

Sometimes activity data or sub-categories have the same data (e.g. the total can be equal to a single sub-category if it is the only sub-category) for different categories. Examples are:

category classification measure gas unit
CO₂ Emissions from Biomass (8270) Biomass (10513) Net emissions/removals (10460) CO₂ (10469) kt (5)
1.AA Fuel Combustion - Sectoral approach (9089) Biomass (10513) Net emissions/removals (10460) CO₂ (10469) kt (5)
category classification measure gas unit
5.C.2 Open Burning of Waste (8943) Total for category (10510) Amount of wastes incinerated/open burned (10566) No gas (10637) kt (5)
5.C.2.b Non-biogenic (8910) Total for category (10510) Amount of wastes incinerated/open burned (10566) No gas (10637) kt (5)
category classification measure gas unit
3.A Enteric Fermentation (9559) Fur-bearing Animals (10643) Population (10596) No gas (10637) 1000s (175)
3.B Manure Management (9608) Fur-bearing Animals (10643) Population (10596) No gas (10637) 1000s (175)

In each case, I think it is correct to put the data into both categories.

mikapfl commented 3 years ago

Now, I see two things to correct in our parsing/querying.

unrestricted queries

If a user asks for all data, the "same category with different IDs" case is trivial, because we don't distinguish the categories based on their ID. It is the same data, and will be put into the same bucket, all is fine.

For the "same data, different categories" case, we should make sure to properly put the data into both categories. Not logically difficult, but some work to do to change the parsing functions to deal with this.

restricted queries

Queries like you did, where the user asks for a specific category only are more difficult. Currently, the user has to specify a categoryId in the query, which they probably get from the categories dataframe. However, asking for 10476 they might get different data than asking for 10485 even though both are "6. Other". Maybe the correct solution would be to change the API so that the user has to specify a category name in the query and handle the intricacies of multiple categories with the same name but different IDs ourselves. That's some work and a new major release because it is a breaking change, but much more useful for the user.

mikapfl commented 3 years ago

The second problem can be seen like this:

In [2]: import unfccc_di_api

In [3]: reader = unfccc_di_api.UNFCCCApiReader()

In [4]: reader.annex_one_reader.query(party_codes=['DEU'], category_ids=[10476])
Out[4]: 
    party   category      classification                      measure  gas     unit       year  numberValue stringValue
0     DEU  6.  Other  Total for category  Emission factor information  CH4  no unit       1990          NaN          NA
1     DEU  6.  Other  Total for category  Emission factor information  CH4  no unit       1991          NaN          NA
2     DEU  6.  Other  Total for category  Emission factor information  CH4  no unit       1992          NaN          NA
3     DEU  6.  Other  Total for category  Emission factor information  CH4  no unit       1993          NaN          NA
4     DEU  6.  Other  Total for category  Emission factor information  CH4  no unit       1994          NaN          NA
..    ...        ...                 ...                          ...  ...      ...        ...          ...         ...
770   DEU  6.  Other  Total for category       Net emissions/removals  SO2       kt       2016          NaN          NO
771   DEU  6.  Other  Total for category       Net emissions/removals  SO2       kt       2017          NaN          NO
772   DEU  6.  Other  Total for category       Net emissions/removals  SO2       kt       2018          NaN          NO
773   DEU  6.  Other  Total for category       Net emissions/removals  SO2       kt       2019          NaN          NO
774   DEU  6.  Other  Total for category       Net emissions/removals  SO2       kt  Base year          NaN          NO

[775 rows x 9 columns]

In [5]: reader.annex_one_reader.query(party_codes=['DEU'], category_ids=[10485])
Out[5]: 
    party   category      classification                 measure                               gas               unit       year numberValue stringValue
0     DEU  6.  Other  Total for category      Indirect emissions                               N2O                 kt       1990        None          NO
1     DEU  6.  Other  Total for category      Indirect emissions                               N2O                 kt       1991        None          NO
2     DEU  6.  Other  Total for category      Indirect emissions                               N2O                 kt       1992        None          NO
3     DEU  6.  Other  Total for category      Indirect emissions                               N2O                 kt       1993        None          NO
4     DEU  6.  Other  Total for category      Indirect emissions                               N2O                 kt       1994        None          NO
..    ...        ...                 ...                     ...                               ...                ...        ...         ...         ...
243   DEU  6.  Other  Total for category  Net emissions/removals  Unspecified mix of HFCs and PFCs  kt CO2 equivalent       2016        None          NO
244   DEU  6.  Other  Total for category  Net emissions/removals  Unspecified mix of HFCs and PFCs  kt CO2 equivalent       2017        None          NO
245   DEU  6.  Other  Total for category  Net emissions/removals  Unspecified mix of HFCs and PFCs  kt CO2 equivalent       2018        None          NO
246   DEU  6.  Other  Total for category  Net emissions/removals  Unspecified mix of HFCs and PFCs  kt CO2 equivalent       2019        None          NO
247   DEU  6.  Other  Total for category  Net emissions/removals  Unspecified mix of HFCs and PFCs  kt CO2 equivalent  Base year        None          NO

[248 rows x 9 columns]

Depending on the exact code the user uses, there are either 775 or 248 rows returned, and the user is left to figure this out themselves. The correct thing that the user has to do currently is:

In [6]: reader.annex_one_reader.query(party_codes=['DEU'], category_ids=[10485, 10476])
Out[6]: 
     party   category      classification                      measure                               gas               unit       year  numberValue stringValue
0      DEU  6.  Other  Total for category  Emission factor information                               CH4            no unit       1990          NaN          NA
1      DEU  6.  Other  Total for category  Emission factor information                               CH4            no unit       1991          NaN          NA
2      DEU  6.  Other  Total for category  Emission factor information                               CH4            no unit       1992          NaN          NA
3      DEU  6.  Other  Total for category  Emission factor information                               CH4            no unit       1993          NaN          NA
4      DEU  6.  Other  Total for category  Emission factor information                               CH4            no unit       1994          NaN          NA
...    ...        ...                 ...                          ...                               ...                ...        ...          ...         ...
1018   DEU  6.  Other  Total for category       Net emissions/removals  Unspecified mix of HFCs and PFCs  kt CO2 equivalent       2016          NaN          NO
1019   DEU  6.  Other  Total for category       Net emissions/removals  Unspecified mix of HFCs and PFCs  kt CO2 equivalent       2017          NaN          NO
1020   DEU  6.  Other  Total for category       Net emissions/removals  Unspecified mix of HFCs and PFCs  kt CO2 equivalent       2018          NaN          NO
1021   DEU  6.  Other  Total for category       Net emissions/removals  Unspecified mix of HFCs and PFCs  kt CO2 equivalent       2019          NaN          NO
1022   DEU  6.  Other  Total for category       Net emissions/removals  Unspecified mix of HFCs and PFCs  kt CO2 equivalent  Base year          NaN          NO

[1023 rows x 9 columns]

which is not directly obvious looking at the category hierarchy ( 6. Other[10476] is a child of Total GHG emissions with LULUCF[8677] while 6. Other[10485] is a child of Total GHG emissions without LULUCF[10464], which should be the same, anyway)

mikapfl commented 3 years ago

Why were you doing a restricted query? Do you need the possibility to filter for category_ids or would category names be easier anyway?

JGuetschow commented 3 years ago

I did a restricted query because I'm looking for certain category, classification, measure combinations only to find what of the needed data is available from the interface. My approach was first to use the category id, but as I found there are sometimes multiple ID for the same category (name), I know specify the name and find the matching ids and then retrieve the data for all of the ids.

mikapfl commented 3 years ago

Okay, so for you it would be easier if query directly used the category (name), right? Then I'll probably go that route.

The changes necessary for proper handling of all situations are not difficult then, only a bit tedious because the internal data structures use the variableId as a key (e.g. in dictionaries), which obviously doesn't work anymore, then. I should also write some tests so I'll know when I actually fixed the bug(s).

JGuetschow commented 3 years ago

It would be best if one can either use ID or name. But on the other hand I have so far not seen any meaningful difference between categories with the same name but different IDs