tattle-made / Uli

Software and Resources for Mitigating Online Gender Based Violence in India
https://uli.tattle.co.in
GNU General Public License v3.0
40 stars 29 forks source link

MySQL to Postgres Migration #613

Closed dennyabrain closed 2 months ago

dennyabrain commented 3 months ago

Because of the following reasons we want to migrate from MySQL to Postgres :

  1. Postgres being more modern and versatile tool
  2. Rest of our tech stack at Tattle has moved to Postgres
  3. Our current MySQL instance on AWS has reached end of service period

In scope for this task it to evaluate

  1. If there are standard tools that we can use to migrate existing mysql database to postgres
  2. any issues with the current database schema that prevents this migration
  3. a working script or command to demonstrate successful schema and data migration from MySQL to Postgres locally
eternaltyro commented 2 months ago

@maanasb01 re: #619 can you also parameterize the DB name to be picked up from envvar? Like process.env.DB_NAME,

aatmanvaidya commented 2 months ago

hi @eternaltyro - have made that small change here - https://github.com/tattle-made/Uli/pull/620, hope that is what you were referring too

eternaltyro commented 2 months ago

@aatmanvaidya yes, that's what I wanted, thanks.

eternaltyro commented 2 months ago

This task is now complete. Here's a summary of the migration.

  1. Database prep

    A new PostgreSQL ROLE was created and was given permissions to perform read/write on the new database.

    postgres=> CREATE ROLE uli_prod WITH LOGIN PASSWORD '<REDACTED>';
    CREATE ROLE
    postgres=> GRANT CONNECT ON DATABASE uli_prod to uli_prod;
    GRANT
    postgres=> GRANT ALL PRIVILEGES ON DATABASE uli_prod TO uli_prod;
    GRANT
    postgres=> \c uli_prod
    Password:
    uli_prod=> GRANT ALL ON ALL TABLES IN SCHEMA public TO uli_prod;
    GRANT
    uli_prod=> GRANT ALL ON ALL SEQUENCES IN SCHEMA public TO uli_prod;
    GRANT
  2. Secrets and configmap

    Split the legacy configmap to (currently non-secrets-manager-backed) Opaque secrets for database credentials. There was an error in the database credentials which was fixed manually.

  3. Deployment and service

    Updated deployment with the latest image tag for the API-server (containing changes for postgres). Hitherto hard-coded DB_NAME was parameterized to be read from environment variable.

Challenges

  1. Database connection could not be verified

    Connection to the database could not be verified because the command and arguments override did not invoke database migrations - which would have raised connection exceptions. Pod spec had to be temporarily modified to fallback to entrypoint script which had database migration steps.

  2. Database connectivity issues

    Database connectivity failed with ENOTFOUND error where the seemingly exact endpoint string could not be resolved to an IP address. Test containers with hard-coded endpoint addresses were able to successfully connect and manual dns queries from the affected pods yielded good results which were very confusing. Initial suspicions were that the sql library was having trouble with SSL certs but containers running on EC2 test instances did not have the issue.

    Eventually, the problem was found to be trailing newline chars \n in secrets values that were base64 encoded. The ENOTFOUND errors were for <host>\n but since new-lines were printed, visually there was no obvious anomaly. It was solved by removing trailing slashes.

    $ echo <bad_base64> | base64 -d | tr -d \\n | base64
    <good_base64>

    This can be mitigated by some-kind of input sanitation programmatically.

  3. Logging issues

    System logs need to be improved. Logging is not verbose for certain actions and at the same time, database transactions are leaked in the logs including user session info.

  4. Issues with table schema

    During migration of the database from MySQL to PostgreSQL, there were some schema mismatches that could have been handled better. MySQL does not have a named boolean data-type. It is therefore aliased to tinyint(1) with truthy values being cast to int (0=false, 1=true). When migrating to PostgreSQL, these columns were translated to 2-byte smallint. When the application tried placing boolean values, data-type restrictions caused errors.

    PostgreSQL however supports a true boolean data-type. So the columns were manually altered and all the existing values were cast into boolean. Since values were simply integers, this was easy to do.

    ALTER TABLE slurs
    ALTER COLUMN casual TYPE boolean USING (casual::int)::boolean,
    ALTER COLUMN appropriated TYPE boolean USING (appropriated::int)::boolean,
    ALTER "appropriationContext" TYPE boolean USING ("appropriationContext"::int)::boolean;

Lessons & next steps

  1. Model reviews - All models to be reviewed in case of any database migrations and resulting schemas must be validated.
  2. Datatype differences must be carefully considered to optimise for storage, memory and indexing.
  3. Logging - logging must be streamlined and centralised with appropriate log-levels configurable for different environments.
  4. ADR - Architecture Decision Records can be maintained and reviewed together in order to hash-out architectural challenges before they occur. (example - https://github.com/thunderbird/thunderbird-android/tree/main/docs/architecture/adr)
eternaltyro commented 2 months ago

Related - https://github.com/tattle-made/infrastructure/issues/1