va-big-data-genomics / trellis-mvp-data-modelling

Resources uses for interacting and updating the Trellis MVP graph database model.
MIT License
0 stars 0 forks source link

Missing (genome)-[]->(cram) connections #6

Open pbilling opened 1 year ago

pbilling commented 1 year ago

There are 692 samples in data release 2 which are missing (Genome)-[HAS_SEQUENCING_READS]->(Cram) relationships. These need to be repaired.

MATCH (s:Study {name:"WgsDataRelease2"})-[]->(:Participant)-[]->(p:Person)
WHERE NOT (p)-[]->(:Genome)-[:HAS_SEQUENCING_READS]->(:Cram)
RETURN COUNT(p)

692
pbilling commented 1 year ago

Looks like only 386 of these CRAMs were registered in the database.

MATCH (:Study {name:"WgsDataRelease2"})-[]->(:Participant)-[]->(p:Person)-[]->(s:Sample)
WHERE NOT (p)-[]->(:Genome)-[:HAS_SEQUENCING_READS]->(:Cram)
MATCH (c:Cram)
WHERE c.sample = s.sample
RETURN COUNT(c)

386