This repository contains tools to fetch and load data from the Excellence in Research for Australia program (ERA) into a database.
The motivation is the reference data for 2010 and 2012 ERA came in slightly different formats, and the data had unusual copyright protections when it was released from the Australian Research Council (ARC), in part due to the Australian Crown copyright.
The ARC website copyright notice mentions Creative Commons (CC) BY 3.0 however it also has an other "Other Copyright" section referring to restrictive Crown copyright, and no statement regarding when CC or Crown applies.
In addition, the datasets for ERA 2010 and ERA 2012 can no longer be found on their website, and the copyright notice did not mention CC licenses when these datasets were available from their website.
To avoid this copyright problem, this repository does not contain any ERA data. Instead it provides tools to fetch and transform the datasets into other formats, and will do analysis on the dataset in the cloud.
All files stored in this repository are available under the MIT license.
This repository has a script fetch.sh
that downloads the following two files from Australian Government Web Archive:
The datasets are processed automatically after each checkin on the following two services:
One way to use this repository is to fork the repository on Github, and add your own transforms or analysis to the scripts in your fork. They can be run automatically using your own Travis-CI and/or Appveyor CI account.
Alternatively you can recreate the steps performed by Travis-CI or Appveyor on your local machine, using a private database.
See .travis.yml
fetch.sh
only requires either the Unix unzip
or 7-zip
in the path,
and either the Unix wget
or the Python wget
package
which can be installed using the pip requirements file win-requirements.txt
.
Other necessary executables include bzip2, sed, perl, and python.
The Windows build depend on most of the same commands as the Unix build.
The most common way to achieve that on Windows is to set up Cygwin or MSYS.
The scripts in this repository have been tested on Cygwin and MSYS 1.0 and MSYS 2.0.
The 7z
executable may be located in a directory containing a space.
All other executables must be in paths that do not include spaces.
See appveyor.yml
In order to run SQL against a Oracle database, this repository downloads and installs Oracle XE.
To download Oracle XE, the Travis CI settings must include environment variables ORACLE_LOGIN_ssousername
and ORACLE_LOGIN_password
.
See travis-oracle for more information on how this works.
Feel free to submit pull requests for additional transforms or analysis. Dont worry if you have only tested the changes against the databases that you have access to. i.e. Open door policy. The more the merrier. Submit pull requests early. It is my job to figure out how to incorporate any new code into the repository.