Looking at our db schema, I've noticed room for improvement, e.g. foreign keys and use of enums. Here's a human review of what Claude 3.5 said about our schema. There's definitely some stuff that could be easy to add with integrity improvements that pay off in the long run:
I'll analyze the schema and suggest improvements. Here are the key areas that could be enhanced:
Missing Foreign Keys:
metadata_upload_aux_table has no FK constraint for group_id
metadata_upload_aux_table and sequence_upload_aux_table share upload_id and submission_id but have no FK relationship
external_metadata has no FK to sequence_entries for accession and version
data_use_terms_table has no FK to sequence_entries for accession
sequence_entries_preprocessed_data has no FK to sequence_entries for accession and version
Missing ENUM Types:
sequence_entries_preprocessed_data.processing_status could be an ENUM ('IN_PROCESSING', 'HAS_ERRORS', 'FINISHED')
seqset_records.type could be an ENUM
The status field in sequence_entries_view uses hardcoded strings that should be an ENUM ('APPROVED_FOR_RELEASE', 'AWAITING_APPROVAL', 'IN_PROCESSING', 'HAS_ERRORS', 'RECEIVED')
data_use_terms_type could be an ENUM
Data Integrity:
No CHECK constraints on timestamps to ensure logical order (e.g., finished_processing_at should be after started_processing_at)
No CHECK constraints on version being positive
Missing Indexes:
No index on group_id in sequence_entries despite being a foreign key
Performance:
Large JSONB columns (processed_data, external_metadata, etc.) might benefit from specific indexes on commonly queried paths
Consider partitioning large tables like sequence_entries if the data volume is high
CREATE TYPE processing_status_enum AS ENUM ('IN_PROCESSING', 'HAS_ERRORS', 'FINISHED');
CREATE TYPE sequence_status_enum AS ENUM ('APPROVED_FOR_RELEASE', 'AWAITING_APPROVAL', 'IN_PROCESSING', 'HAS_ERRORS', 'RECEIVED');
Add basic check constraints
ALTER TABLE sequence_entries_preprocessed_data
ADD CONSTRAINT check_processing_timestamps
CHECK (finished_processing_at IS NULL OR finished_processing_at > started_processing_at);
ALTER TABLE sequence_entries
ADD CONSTRAINT check_version_positive
CHECK (version > 0);
Add missing indexes
CREATE INDEX idx_sequence_entries_group ON sequence_entries(group_id);
Looking at our db schema, I've noticed room for improvement, e.g. foreign keys and use of enums. Here's a human review of what Claude 3.5 said about our schema. There's definitely some stuff that could be easy to add with integrity improvements that pay off in the long run:
I'll analyze the schema and suggest improvements. Here are the key areas that could be enhanced:
Missing Foreign Keys:
Missing ENUM Types:
Data Integrity:
Missing Indexes:
Performance:
From Claude:
Here's what I would implement first:
Create ENUMs
Add basic check constraints