pmsoltani / elsametric

0 stars 0 forks source link

Make "elsametric" compatible with PostgreSQL #50

Closed pmsoltani closed 5 years ago

pmsoltani commented 5 years ago

Feature description

Regarding the text in elsametric.md, this issue is going to address how to make elsametric work with Postgres.

Suggested solution

pmsoltani commented 5 years ago

I've tested Postgres on SQLAlchemy using psycopg2. These are the problems I ran into before I was able to successfully execute the script db_populate.py:

  1. __init__.py: Instead of using two different engine URIs, I've created the following template:
ENGINE_URI = f'{DIALECT}+{DB_DRIVER}://{DB_USER}:{DB_PASS}@{DB_HOST}/{DB_NAME}'

So the database section of config.json has the following format:

{
  "startup": {
    "dialect": "postgresql",
    "mysql": {
      "driver": "mysqlconnector",
      "host": "localhost",
      "schema": "scopus_pg5",
      "user": "root",
      "pass": "Qwerty12#"
    },
    "postgresql": {
      "driver": "psycopg2",
      "host": "localhost",
      "schema": "scopus_pg5",
      "user": "postgres",
      "pass": "Qwerty12#"
    }
  }
}
  1. __inti__.py: sqlalchemy_utils could not create database using the parameter encoding='utf8mb4'. I've changed it to encoding='utf8' (which is also the default value for sqlalchemy_utils version 0.34.2).
  2. author.py: PostgreSQL ENUM type required a name. I've also noticed that MySQL does not respect the current ENUM type natively, so I've added the following check constraint for the sex column:
CheckConstraint('''sex IN ('m', 'f')''', name='gender_types')

I then added name='gender_types' to the ENUM column to make it Postgres compliant. It should be noted that all column types for all models have been imported from the sqlalchemy.dialects.mysql module, which provides MySQL specific column types (that may or may not work with other backends). This should change to multiple vendor types later on.

  1. author.py, department.py, institution.py, paper.py, source.py: Postgres does not support the clause ON UPDATE CURRENT_TIMESTAMP in the update_time column, and one should use a trigger to achieve the same result. So, I've added the following snippet after creating each of the affected classes:
update_time_trigger = DDL(
    '''
    CREATE OR REPLACE FUNCTION set_update_time()
    RETURNS TRIGGER AS $$
    BEGIN
        NEW.update_time = now();
        RETURN NEW;
    END;
    $$ language 'plpgsql';

    CREATE TRIGGER source_update_time
        BEFORE UPDATE ON source
        FOR EACH ROW
        EXECUTE PROCEDURE  set_update_time();
    '''
)

event.listen(Source.__table__, 'after_create', update_time_trigger)

The code above was inserted after Source class inside source.py. For other classes, I've replaced Source or source with the relevant name. It should be mentioned that DDL and event were imported using from sqlalchemy import DDL, event. Another thing to mention is that the snippet for the trigger was taken from here because I couldn't use the function moddatetime from the spi module, as I would get the following error: function moddatetime() does not exist.

  1. source_metric.py: Postgres does not have a column type year.
  2. All classes: Postgress does not support unsigned integers. It has to be implemented using a check constraint:
__table_args__ = (
    CheckConstraint('id >= 0', name='id_unsigned'),
)

I know that id columns will not / should not accept 0 as a value, but to be faithful to the definition of MySQL's unsigned integers, I've used >= instead of >.

pmsoltani commented 5 years ago

After solving the encountered problems, I was able to successfully populate a PostgreSQL database. However, with the same settings, I now am unable to do the same with MySQL; so additional fine-tuning is needed.

pmsoltani commented 5 years ago

After f1460dc, attempting to populate a MySQL database caused the following errors:

  1. Cannot use a check constraint on a column with autoincrement.
  2. MySQL does not recognize the triggers added to author.py, department.py, institution.py, paper.py, source.py for the updated timestamp.

To overcome the first issues, I've imported DIALECT and changed the check constraints on the id columns like so:

__table_args__ = (
    CheckConstraint(
        'id >= 0',
        name='author_profile_id_unsigned'
    ) if DIALECT == "postgresql" else None,
)

To deal with the second problem, I've created a new constant called UPDATE_TIME_DEFAULT in base.py that decides what should the server_default argument of the update_time column be:

UPDATE_TIME_DEFAULT = 'CURRENT_TIMESTAMP'
if DIALECT == "mysql":
    UPDATE_TIME_DEFAULT = 'CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP'

To check that the DIALECT constant, which is directly read from config.json has a correct value, I've added an assertion in __init__.py:

assert DIALECT in ('mysql', 'postgresql')
pmsoltani commented 5 years ago

The provided solutions worked. I've tested the model with both MySQL and PostgreSQL successfully. It seems that everything is working as before #50. Further tests may be needed, especially by using elsaserver.