Establish quality diagnostics

As suggested by Pierpaolo Cazzola on the 2019-11-15 call, we need diagnostic calculations that run every time the historical database is regenerated, giving statistics that are helpful in diagnosing data quality issues:

Crucial ones that I can think of are (all for consistent modes, vehicle, powertrain and fuel types…):

pkm/vkm—should have the same magnitude of data on average loads

tkm/vkm—ditto.

vkm/number of vehicles in the stock—should have the same magnitude of data on average distances travelled.

number of years of newly registered vehicles allowing to explain the total of the number of all vehicles in the stock—should be consistent with reasonable amounts of years of average survival of vehicles.

The last parameter is tricky, since the reality is that there are countries where most of the new registrations are essentially new vehicles, and countries where large shares of these are second hand vehicle imports. This calls for adding differentiated data inputs on new vehicle registrations, new vehicle sales, and second hand vehicle imports.

energy use/vkm: this is a harder one to crack, since fuel use is not reported, typically, at the same level of level of disaggregation as other vehicle-centered parameters,

Pkm/number of passengers—should be consistent with data available on average trip distances (or lengths of hauls)

Tkm/number of tonnes lifted—ditto.

These could be used to extract relationships/average values that could help out filling gaps.

As part of a Google Drive spreadsheet shared by @hlinero / co-developed with @soniayeh, the following checks were defined:

ID	Dividend	Divisor	Resulting variable
A001	T000 Passenger Activity \| 10^9 passenger-km / yr \| Passenger \| Road \| LDV	T000 Passenger Activity \| 10^9 passenger-km / yr \| Passenger \| ALL \| ALL	iTEM \| Passenger Activity \| % in total inland passenger-km / yr \| Passenger \| Road \| LDV
A002	T000 Passenger Activity \| 10^9 passenger-km / yr \| Passenger \| Road \| LDV	T008 Stock \| 10^6 vehicle \| Passenger \| Road \| LDV	iTEM Passenger vehicle Activity \| 10^3 passenger-km/vehicle \| Passenger \| Road \| LDV
A003	T003 Freight Activity \| 10^9 tonne-km / yr \| Freight \| Road \| All	T010 Stock \| 10^6 vehicle \| Freight \| Road \| All	iTEM Freight vehicle Activity \|10^3 tonne-km / vehicle \| Freight \| Road \| All

Draft IPython notebooks were also shared. I will open a new PR to introduce A003 into the code.

ID	Dividend	Divisor	Resulting variable
A001	T000 Passenger Activity \| 10^9 passenger-km / yr \| Passenger \| Road \| LDV	T000 Passenger Activity \| 10^9 passenger-km / yr \| Passenger \| ALL \| ALL	iTEM \| Passenger Activity \| % in total inland passenger-km / yr \| Passenger \| Road \| LDV
A002	T000 Passenger Activity \| 10^9 passenger-km / yr \| Passenger \| Road \| LDV	T008 Stock \| 10^6 vehicle \| Passenger \| Road \| LDV	iTEM Passenger vehicle Activity \| 10^3 passenger-km/vehicle \| Passenger \| Road \| LDV
A003	T003 Freight Activity \| 10^9 tonne-km / yr \| Freight \| Road \| All	T010 Stock \| 10^6 vehicle \| Freight \| Road \| All	iTEM Freight vehicle Activity \|10^3 tonne-km / vehicle \| Freight \| Road \| All

transportenergy / database

Establish quality diagnostics #15