populationgenomics / metamist

Sample level metadata system
MIT License
1 stars 1 forks source link

Querying analyses with many sequencing groups and large meta causes non-responsive queries #790

Closed dancoates closed 4 months ago

dancoates commented 4 months ago

A query like:

query MyQuery {
  project(name: "project-name") {
    analyses(type: {eq: "sv"}) {
      id
      output
    }
  }
}

will produce a SQL query that takes a very long time to resolve:

SELECT a.id as id, a.type as type, a.status as status,
        a.output as output, a_sg.sequencing_group_id as sequencing_group_id,
        a.project as project, a.timestamp_completed as timestamp_completed,
        a.active as active, a.meta as meta, a.author as author
FROM analysis a
LEFT JOIN analysis_sequencing_group a_sg ON a.id = a_sg.analysis_id
WHERE a.project = 6 AND a.type = 'sv' AND a.status = 'completed'

This is because each analysis has many sequencing groups and the large meta object needs to be replicated for each row produced by the join.

We can potentially resolve this by moving the joining of the meta into the python code rather than having it in the DB.