iangow / se_core

Core code for StreetEvents data
7 stars 5 forks source link

StreetEvents data

The code here transforms XML files for conference calls supplied by Thomson Reuters into structured tables in a PostgreSQL database.

1. Requirements

To use this code you will need a few things.

  1. A directory containing the .xml files.
  2. A PostgreSQL database to point to.
    • Database should have a schema streetevents and a role streetevents. The following SQL does this:
CREATE SCHEMA streetevents;
CREATE ROLE streetevents;
CREATE ROLE streetevents_access;
  1. The following environment variables set:
    • PGHOST: The host address of the PostgreSQL database server.
    • PGDATABASE: The name of the PostgreSQL database.
    • SE_DIR: The path to the directory containing the .xml files.
    • PGUSER (optional): The default is your log-in ID; if that's not correct, set this variable.
    • PGPASSWORD (optional): This is not the recommended way to set your password, but is one approach.
  2. R and the following packages: xml2, stringr, dplyr, parallel, RPostgreSQL, digest

2. Processing core tables

To process the core tables, the three code files below need to be run in the following order:

The script update_se.sh runs the three code files.

3. The tables used

4. Advanced Users

This section is for more advanced users wating to download and process the data.