michaelgfalk / ohrm-jsonld-exporter

Code to export an OHRM database to JSON-LD (loosely RO-Crate format)
GNU General Public License v3.0
1 stars 0 forks source link

OHRM -> JSON-LD export tool

Developing the code

Get the data

Before you get started you will need the data from someone who has it. That is, you will need to get hold of the Postgres db dump of an OHRM or OHRMs. Unpack those files in the folder ./data which is mounted into the container /srv/data.

Start the postgres container and load an OHRM DB dump

The OHRM data is now in the database. As the container is backed by a persistent volume in docker you will not need to do these steps again the next time you start the DB unless you remove that volume. Repeat these steps to load more OHRM datasets into this database instance.

Working with an OHRM dataset

Repository Layout / Code overview

for (let row of await models.entity.findAll({ limit: pageSize, offset })) {
    const properties = [
        ...
        "elegalno",
        ["estartdate", 'dateCreated' ],
        "esdatemod",
        ...

In this example the data in the estartdate column will be written out to a field called dateCreated in the JSON-LD snippet.

Automation using Python

The repo contains a python script for automatically processing all the OHRMs in a directory, and uploading them to figshare.

The script is designed to work with conda. Create a conda environment using the conda-env.yml like so:

conda env create --name ohrm-exporter --file=conda-env.yml
conda activate ohrm-exporter

Once you have configured the environment, you can run the script. It requires three pieces of information, the figshare api endpoint, the directory where the OHRMs are stored, and your figshare API token. You can either provide these in the command line, or create a file called .python-config.yml and store the information there using the following template:

figshare_token: your api token
figshare_endpoint: https://api.figshare.com/v2
ohrm_directory: path/to/ohrms/