This repository brings together the shell scripts, configuration files, log file outputs, and other ephemera of data preparation actions. Keeping these in a repo has the following purposes and benefits:
IMPORTANT
Dangerous combination:
Therefore:
Repo contents are organized by actions. The term "action" is loosely defined, but it encompasses the processing of a specific group of files with a specific goal in mind.
Example action: To form climatological means of CMIP5 output files and index them in the Climate Explorer
modelmeta database (ce_meta
). This description would form part (or all) of the DESCRIPTION.md
file for that action.
This action would be composed of the following sub-actions:
Actions are organized under the actions
subdirectory as follows. Only subdirectories for sub-actions actually done
need to be included.
actions/
<action-name>/
DESCRIPTION.md
convert-metadata/
# updates.yaml file(s) for update_metadata script
# shell script(s) for invoking updata_metadata
# log files generated
climo-means/
# shell script(s) for adding files to jobqueue
# log files generated
index/
# shell script(s) for invoking index_netcdf
# log files generated
modelmeta-fixup/
# SQL query used to correct the filepath (or other fixup, heaven forbid)
# log files generated
Further subdirectories can be added as needed (and ideally documented here) for additional sub-actions.
Log files can be bulky, so be careful about adding large log files to this repo.
GitHub recommends that repositories be kept under 1 GB, and enforces a strict limit of 100 MB per file. For large files, either store a pointer to our local storage, or use GitHub Large File Storage.
Also note that if large files are stored in this repo, then they are replicated wherever this repo is cloned. That may not be desirable.