informatics-isi-edu / pdb-ihm

Deriva Protein Database Project
3 stars 1 forks source link

Model extension for production release #58

Open brindakv opened 3 years ago

brindakv commented 3 years ago
  1. Extend the model based on the current set of guidelines; implement the new set of tables. 1.1 Deleting tables in PDB schema that were created recently. 1.1.1 Inbound fkey, e.g., the new table is a parent table. Clean up required before dropping the table.
    • Create a helper function to find the inbound fkeys, given a table name; drop these fkeys @svoinea 1.1.2 Drop table (its columns and outbound fkeys will be dropped automatically)
  2. Create access control script. Find out how to implement access control for the entire catalog based on PDB-Dev requirements and implement a script for the same. @svoinea
  3. Apply naming conventions to existing tables. 3.1 Keys 3.2 Optional denormalized foreign keys
    • If one or more columns in the composite foreign key has nullok=True, then it is an optional denormalized fkey
    • Verify that there are only 39 of such foreign keys (#37) @svoinea 3.3 Mandatory denormalized foreign keys
    • If all of the columns in the composite foreign key has nullok=False, then it is a mandatory denormalized fkey
  4. Make rules to come up with shorter table names so that keys and fkeys are within the 63 character limit. E.g., remove ihm_ or change geometry to geom or change transform to xform. 4.1 Create a look up table to change long mmCIF names to shorter Deriva names. Note: Tables with long names are in #42.
svoinea commented 3 years ago

@brindakv I have added in the 2021_01_model_updates.py script the function drop_table(model, schema_name, table_name) which drops the table together with the referenced keys. Example of call:

drop_table(model, 'PDB', 'ihm_pseudo_site')
svoinea commented 3 years ago

FYI, hereby is way my script was renaming the composite FK:

if 'structure_id' in foreign_key_columns:
    if 'RID' in referenced_columns:
        use the combo2
    else:
        use the combo1
else:
    use denorm if length > 63
brindakv commented 3 years ago

@svoinea In the drop_table function, what happens if the child table that references the foreign key has data? Can the foreign key be dropped?

svoinea commented 3 years ago

@brindakv Dropping a FK does not drop its columns. It drops only the condition the columns data have to meet. Therefore the data is preserved.

svoinea commented 3 years ago

Current algorithm used for renaming the composite FK:

if 'RID' in referenced_columns:
    if 'structure_id' in foreign_key_columns:
        use combo1
    else:
        use combo2
else:
    if length > 63:
        use denorm

Current results: PK to be renamed: pk_rename.txt FK to be renamed with combo1: fk_rename_to_combo1.txt FK to be renamed with combo2: fk_rename_to_combo2.txt FK with structure_id to be renamed: fk_rename_structure_id.txt FK to be renamed with denorm: fk_rename_denorm.txt FK to be renamed: fk_rename.txt FK duplicates: fk_rename_duplicates.txt FK too long names: fk_rename_too_long_names.txt

svoinea commented 3 years ago

Current PK grouped by the number of columns and the presence of the RID and/or structure_id columns pk.txt Current FK grouped by the number of columns and the presence of the RID and/or structure_id columns fk.txt No change was done in the DB. No suggestions for renaming any PK and or FK. The two files are just an image of the current keys and foreign keys of the PDB schema of catalog 99.

svoinea commented 3 years ago

I have identified the following sequence of operations to be done (updating the annotations and acls is not included):

  1. 32 columns to be renamed and 102 columns to be added (of the form <Parent_Table>_RID): columns.txt

  2. 102 SQL update statements to load values for the new added <Parent_Table>_RID columns: sql.txt

  3. 13 Primary Keys to be renamed and 29 Primary Keys to be added: primary_key.txt

  4. 54 Foreign Keys to be renamed, 147 Foreign Keys to be added, and 147 Foreign Keys to be dropped: foreign_key.txt

I have also identified 13 duplicates in naming the Foreign Keys: duplicates.txt

and 39 Foreign Keys that have names longer than 63 characters: long_names.txt

svoinea commented 3 years ago

Foreign keys *_fk relative to the *_fkey ones: fk_fkey.txt

Foreign keys used in the annotations: fk_annotations.txt

Foreign keys used in the acl bindings: fk_acl_bindings.txt