loculus-project / loculus

An open-source software package to power microbial genomic databases
https://loculus.org
GNU Affero General Public License v3.0
37 stars 2 forks source link

Improve db schema: foreign keys, enums, etc. #3201

Open corneliusroemer opened 1 week ago

corneliusroemer commented 1 week ago

Looking at our db schema, I've noticed room for improvement, e.g. foreign keys and use of enums. Here's a human review of what Claude 3.5 said about our schema. There's definitely some stuff that could be easy to add with integrity improvements that pay off in the long run:

I'll analyze the schema and suggest improvements. Here are the key areas that could be enhanced:

Missing Foreign Keys:

Missing ENUM Types:

Data Integrity:

Missing Indexes:

Performance:

From Claude:

Here's what I would implement first:

  1. Add missing foreign keys
ALTER TABLE metadata_upload_aux_table 
ADD CONSTRAINT fk_metadata_upload_group 
FOREIGN KEY (group_id) REFERENCES groups_table(group_id);

ALTER TABLE external_metadata 
ADD CONSTRAINT fk_external_metadata_sequence 
FOREIGN KEY (accession, version) REFERENCES sequence_entries(accession, version);

ALTER TABLE sequence_entries_preprocessed_data 
ADD CONSTRAINT fk_preprocessed_sequence 
FOREIGN KEY (accession, version) REFERENCES sequence_entries(accession, version);
  1. Create ENUMs

    CREATE TYPE processing_status_enum AS ENUM ('IN_PROCESSING', 'HAS_ERRORS', 'FINISHED');
    CREATE TYPE sequence_status_enum AS ENUM ('APPROVED_FOR_RELEASE', 'AWAITING_APPROVAL', 'IN_PROCESSING', 'HAS_ERRORS', 'RECEIVED');
  2. Add basic check constraints

ALTER TABLE sequence_entries_preprocessed_data 
ADD CONSTRAINT check_processing_timestamps 
CHECK (finished_processing_at IS NULL OR finished_processing_at > started_processing_at);

ALTER TABLE sequence_entries 
ADD CONSTRAINT check_version_positive 
CHECK (version > 0);
  1. Add missing indexes
    CREATE INDEX idx_sequence_entries_group ON sequence_entries(group_id);