Set up the infrastructure for developing the code as an R package, e.g. DESCRIPTION, NAMESPACE, directory structure, inst/, etc. This makes it easy to:
Use the functions in higher level scripts.
Test the functions to ensure that the data is being read correctly.
Share the code with other developers.
Communicate package dependencies to other developers.
Implemented 4 functions to read the first 4 Summary Tables (ST1, ST2, ST3, ST4) from the biomass data of Ruggerone and Irvine (2018). Each table is read in as a tibble. These functions can (should!) be generalized, but they work fine for now. For example, the sheet number and cell ranges are embedded in the code, but those values should be refactored to a configuration file.
read_biomass_st1
read_biomass_st2
read_biomass_st3
read_biomass_st4
Placed the biomass spreadsheet in the inst/extdata directory, rather than reading it dynamically from the web source, because we want to work with a known version. We do not want our package to break when/if the spreadsheet is changed on a remote web site.
This is just a start. There is still lots to do:
Read the other 20 data tables.
Read the metadata from the spreadsheet.
Transform the tables into CSVs that can be loaded easily into a graph database.
Write scripts to load the CSVs into a graph database.
inst/extdata
directory, rather than reading it dynamically from the web source, because we want to work with a known version. We do not want our package to break when/if the spreadsheet is changed on a remote web site.This is just a start. There is still lots to do: