rOpenGov / iotables

Importing and Manipulating Symmetric Input-Output Tables
https://iotables.dataobservatory.eu
Other
20 stars 8 forks source link

ghg_get() #8

Closed antaldaniel closed 2 years ago

antaldaniel commented 3 years ago

The package could be sent today for a new CRAN release, I implemented tidyverse changes, some programming good practices, etc, and it works without a hiss (last time I checked.) But I would like to add something new, and I would also like to start blogging about it.

Most of the package implements the comprehensive manual written for official statisticians about symetric input-output tables (SIOTs). Jörg Beutel's excellent handbook, published by Eurostat, uses a simplified German SIOT from 1990 to explain how things go. All unit tests are made against this. This is introduced in the first vignette.

SIOTS give a very detailed structure of a national economy. What happens if a musician gets a paid, or pays less taxes, and starts to spend her money in restaurants, buys new strings to the guitar... how this will create demand and GDP components in the restaurant sector, the instrument retail sector, etc.

Impact analysis means, for example, how many jobs will be created if musicians have more money to spend. The Working with Eurostat Data shows this with actual Eurostat data, comparing Czechia and Slovakia. There is a separate vignette that shows how to do similar calculations when the data is given in a different format, the United Kingdom Input-Output Analytical Tables, which was my other external point of validation. Richard Wild was so kind to compare the UK Stats internal results with my packages to see if all goes well.

What is currently not implemented is environmental impact analysis, which calculates how much more CO2 or methane will be emitted if the musician starts to spend. But it is a very easy thing to do. Both Working with Eurostat Data and United Kingdom Input-Output Analytical Tables have the templates for this when they calculate the employment effects and employment multipliers. Beutel's manual gives an example on the simplified Germany 1990 SIOT an example who to use environmental and energy use impact calculations on the pages 497-506. (Most of the pages are taken up by the tables itself.)

Because from an analytic point of view, calculating employment or NO2 or CO2 or other GHG effects is the same equations, you can take the employment examples:

employment_get - via the eurostat package gets the correct indicator, and arranges it into the correct 1x64 matrix form to conform the standard 64x64 SIOT.

Employment is not part of the original SIOT table, it is an auxilliary data that we must bring into our matrix equation in a conforming vector format. In the case of employment, the challenge was that Eurostat releases employment in greater detail than the SIOT, i.e. more industry groups are present in the employment statistics than the in the SIOT. So some of the columns of the employment data had to be added together and labelled to match exactly the SIOT data.

For GHG gases, you would need a similar function to obtain the relevant data from Greenhouse gas emissions by source sector (source: EEA, probably by a similar ghg_get() function with a parameter for the pollutant you want to use, as this data folder has many-many polutants and GHGs.

This particular task has to be done twice to work correctly, because in Europe SIOTs have two different formats, and member states can decide which to follow, industryxindustry, or productxproduct. The end result is the same, but different original data is used, and therefore Eurostat labels them sligthly differently. In either case, you must get a vector of size 64, with two alternative labelling, depending if a country's statistical system uses the industry x industry approach (like NL) or product x product (most countries). employment_get is basically a template for this.

What requires judgement is how to use the data source. In the case of employment, we had an easy problem: the industry (product) division of the data was richer in the auxilliary source, and I just had to add rows together. The problem with the Eurostat/EEA data is that it uses a very different division. Energy production is one column in the SIOT and in employment statistics, as not many people work in power plants, but for EEA, because energy accounts for 30% of green house gases, it is broken up by to many sub-columns, you need to add them together. Other industries, like performing arts where music performances belong, is not even mentioned, because it is such a small source of GHG. It should be either zeroed out or somehow proportionally (maybe proportional to employment) spread out to the 'other columns'.

The work is done with all sum(GHG) equals in the original format the new sum(1x64) format.

What will be the end product? A very powerful tool that people only use in isolation, on a country level, because it is difficult to put together the EU-level data. It can answer questions like if countries, like Germany, increases the VAT on meat, how much will it reduce methane and CO2 output? The SIOTs take into consideration the demand for meat met by local and imported sources. Or, if the European Commissions forecasts on GDP growth are met in the EU countries, how much more CO2 will they emit next year? In which country / sector can you decrease CO2 output with the lower cost? Which is the highest cost? If you put a CO2 tax on a product, how it will change CO2 and how many jobs will be lost? Etc.

antaldaniel commented 2 years ago

This is airpol_get() in 0.4.7.