transportenergy / database

Tools for accessing and maintaining the iTEM model & historical databases
https://transportenergy.rtfd.io
GNU General Public License v3.0
24 stars 8 forks source link

Automate download of input data #21

Closed khaeru closed 4 years ago

khaeru commented 4 years ago

As of #18, the input data for the historical database are stored in the transportenergy/metadata repostory. For the historical database to actually be reproducible, these files must be retrieved automatically, from original, public sources.

This PR adds code to download the data which is an input for the processing scripts. It does this for sources that can be read using either SDMX or the OpenKAPSARC API, i.e. datasets T001–T024, excluding T004. The number of lines (= observations, since these files are all in long format):

$ wc -l historical/input/*
      289 historical/input/S012_input.csv
     9240 historical/input/T000.csv
     9227 historical/input/T000_input.csv
     1581 historical/input/T001.csv
     1580 historical/input/T001_input.csv
     6541 historical/input/T002.csv
     6482 historical/input/T002_input.csv
    13225 historical/input/T003.csv
    13213 historical/input/T003_input.csv
    16151 historical/input/T004_input.csv
     9317 historical/input/T005.csv
     3363 historical/input/T005_input.csv
     9190 historical/input/T006.csv
     1407 historical/input/T006_input.csv
     6597 historical/input/T007.csv
     2555 historical/input/T007_input.csv
    17118 historical/input/T008.csv
     9034 historical/input/T008_input.csv
    20801 historical/input/T009.csv
      141 historical/input/T009_input.csv
     1016 historical/input/T010.csv
      141 historical/input/T010_input.csv
     1407 historical/input/T011.csv
     1407 historical/input/T012.csv
      415 historical/input/T013.csv
    38361 historical/input/T014.csv
    81696 historical/input/T015.csv
  1154989 historical/input/T016.csv
  2473186 historical/input/T017.csv
    32356 historical/input/T018.csv
     1834 historical/input/T019.csv
     5908 historical/input/T020.csv
      867 historical/input/T021.csv
     3428 historical/input/T022.csv
     4081 historical/input/T023.csv
    87766 historical/input/T024.csv

This functionality must eventually be expanded to cover all data sources.

Supersedes #19.

codecov[bot] commented 4 years ago

Codecov Report

:exclamation: No coverage uploaded for pull request base (master@e71e761). Click here to learn what that means. The diff coverage is 82.55%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master      #21   +/-   ##
=========================================
  Coverage          ?   50.03%           
=========================================
  Files             ?       32           
  Lines             ?     1425           
  Branches          ?        0           
=========================================
  Hits              ?      713           
  Misses            ?      712           
  Partials          ?        0           
Impacted Files Coverage Δ
item/tests/test_cli.py 0.00% <ø> (ø)
item/tests/test_historical.py 0.00% <0.00%> (ø)
item/tests/test_openkapsarc.py 0.00% <0.00%> (ø)
item/historical/cli.py 61.90% <33.33%> (ø)
item/remote/openkapsarc.py 89.58% <80.00%> (ø)
item/historical/__init__.py 35.71% <95.65%> (ø)
item/cli.py 86.11% <100.00%> (ø)
item/common.py 85.71% <100.00%> (ø)
item/remote/__init__.py 100.00% <100.00%> (ø)
item/remote/cli.py 100.00% <100.00%> (ø)
... and 33 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update e71e761...352741f. Read the comment docs.