usegalaxy-eu / infrastructure-playbook

Ansible playbook for managing UseGalaxy.eu infrastructure.
MIT License
16 stars 91 forks source link

Store interactive tool sessions in PostgreSQL database "gxitproxy" #1251

Open kysrpex opened 2 months ago

kysrpex commented 2 months ago

Use PostgreSQL instead of SQLite to store interactive tool sessions. Requires galaxyproject/galaxy#18481, galaxyproject/gx-it-proxy#21. Closes usegalaxy-eu/issues#422.

Before merging, pay attention to the following.

Migration procedure:

  1. On Postgres, create the database gxitproxy and set the galaxy user as its owner. Create also the table gxitproxy on beforehand (gx-it-proxy refuses to run without it). For greater details you may have a look at the updated GTN training.
  2. Copy the migration script below to sn06.
  3. Use inotifywait to watch "/opt/galaxy/mutable-config/interactivetools_map.sqlite" and run the migration script on every modification to the file. This will guarantee that all entries present in the SQLite database will be also present in Postgres. psql will complain about rows that already exist in the target table, but despite the complaint it will copy the rows which still are not in the target table.
  4. Run Jenkins so that the handlers are restarted. Ensure Gunicorn handlers are restarted as well.
  5. Once everything is restarted, Galaxy will not be using the SQLite mapping anymore. Deletions are not synced by the migration script, therefore, there may be leftover rows in Postgres. Optionally (they do not cause any issues), you may delete them as follows: query the PostgreSQL table "gxitproxy" to get all values of the "key" column. Decode them (for example using gxadmin). The resulting values are ids for interactive tool entry points (the id column on table "interactivetool_entry_point"). Filter this table by those values and do an inner join with the table "job" on interactivetool_entry_point.job_id=job.id. Filter the result by state different than "running". Encode all ids for interactive tool entry points and delete them from the gxitproxy table.

Migration script:

#!/bin/bash

set -e

# Set variables
SQLITE_DB="/opt/galaxy/mutable-config/interactivetools_map.sqlite"
SQLITE_TABLE="gxitproxy"
PG_DB="gxitproxy"
# PG_USER="your_pg_user"
# PG_PASSWORD="your_pg_password"
# PG_HOST="your_pg_host"
# PG_PORT="your_pg_port"
PG_TABLE="gxitproxy"

# Function to export SQLite data to a CSV file
export_sqlite_to_csv() {
    sqlite3 -header -csv "$SQLITE_DB" "SELECT * FROM $SQLITE_TABLE;" > /tmp/interactivetools_map.csv
    chmod 660 /tmp/interactivetools_map.csv;
}

# Function to import CSV data to PostgreSQL
import_csv_to_postgresql() {
    # export PGPASSWORD="$PG_PASSWORD"
    # psql -h "$PG_HOST" -p "$PG_PORT" -U "$PG_USER" -d "$PG_DB" -c "\copy $PG_TABLE FROM 'data.csv' CSV HEADER"
    psql -d "$PG_DB" -c "\copy $PG_TABLE FROM '/tmp/interactivetools_map.csv' CSV HEADER"
}

# Main script execution
export_sqlite_to_csv
set +e
import_csv_to_postgresql
set -e

# Clean up
rm /tmp/interactivetools_map.csv
kysrpex commented 2 months ago

This is a draft because none of the requirements have been merged nor deployed, that will still take a while. Otherwise it should work.