transportenergy / database

Tools for accessing and maintaining the iTEM model & historical databases
https://transportenergy.rtfd.io
GNU General Public License v3.0
24 stars 8 forks source link

Establish quality diagnostics #15

Closed khaeru closed 3 years ago

khaeru commented 4 years ago

As suggested by Pierpaolo Cazzola on the 2019-11-15 call, we need diagnostic calculations that run every time the historical database is regenerated, giving statistics that are helpful in diagnosing data quality issues:

Crucial ones that I can think of are (all for consistent modes, vehicle, powertrain and fuel types…):

  • pkm/vkm—should have the same magnitude of data on average loads
  • tkm/vkm—ditto.
  • vkm/number of vehicles in the stock—should have the same magnitude of data on average distances travelled.
  • number of years of newly registered vehicles allowing to explain the total of the number of all vehicles in the stock—should be consistent with reasonable amounts of years of average survival of vehicles.
    • The last parameter is tricky, since the reality is that there are countries where most of the new registrations are essentially new vehicles, and countries where large shares of these are second hand vehicle imports. This calls for adding differentiated data inputs on new vehicle registrations, new vehicle sales, and second hand vehicle imports.
  • energy use/vkm: this is a harder one to crack, since fuel use is not reported, typically, at the same level of level of disaggregation as other vehicle-centered parameters,
  • Pkm/number of passengers—should be consistent with data available on average trip distances (or lengths of hauls)
  • Tkm/number of tonnes lifted—ditto.

These could be used to extract relationships/average values that could help out filling gaps.

khaeru commented 4 years ago

As part of a Google Drive spreadsheet shared by @hlinero / co-developed with @soniayeh, the following checks were defined:

ID Dividend Divisor Resulting variable
A001 T000 Passenger Activity | 10^9 passenger-km / yr | Passenger | Road | LDV T000 Passenger Activity | 10^9 passenger-km / yr | Passenger | ALL | ALL iTEM | Passenger Activity | % in total inland passenger-km / yr | Passenger | Road | LDV
A002 T000 Passenger Activity | 10^9 passenger-km / yr | Passenger | Road | LDV T008 Stock | 10^6 vehicle | Passenger | Road | LDV iTEM Passenger vehicle Activity | 10^3 passenger-km/vehicle | Passenger | Road | LDV
A003 T003 Freight Activity | 10^9 tonne-km / yr | Freight | Road | All T010 Stock | 10^6 vehicle | Freight | Road | All iTEM Freight vehicle Activity |10^3 tonne-km / vehicle | Freight | Road | All

Draft IPython notebooks were also shared. I will open a new PR to introduce A003 into the code.