populationgenomics / automated-interpretation-pipeline

Rare Disease variant prioritisation MVP
MIT License
5 stars 4 forks source link

Use Seqr to provide data directly #152

Closed MattWellie closed 1 year ago

MattWellie commented 1 year ago

Metadata lookups? WE'LL DO IT LIVE

Image

@illusional has provided an API view onto the metadata content, accessible through a couple of endpoints. One endpoint in particular only exists on seqr-reanalysis-dev at the moment @

https://<SEQR_INSTANCE>/api/project/<PROJECT>/samples/sa

e.g.

https://seqr-reanalysis-dev.populationgenomics.org.au/api/project/R0003_acute_care_testing/samples/sa

Directly accessing the data from this endpoint could be used to replace the current cache'ing of json to link between CPG IDs and seqr individual/family IDs. Put together a couple of usage examples to demonstrate how this could be useful

Issues

These endpoints are restricted to specific service accounts, so will require a specific service user authentication... Is that true? Ideally we'll be able to run this as the same user we authenticate as through analysis runner, rather than doing some weird things with authentication at runtime to swap users

Writing code that uses this will make some invalid assumptions - every Seqr instance will have these novel endpoints.

MVP

For demo purposes, write a single script (which could be swapped in as an AIP component) to do the following:

  1. Start with a project ID
  2. Query Metamist for the pedigree details
  3. Query Metamist for the corresponding project(s) in seqr
  4. Phrase Seqr queries (using a second, separate GCP account for now) to obtain the family ID in seqr for each sample
  5. Allow for possibility of each sample mapping to multiple sequence IDs, and possibility of presence in multiple seqr projects, each with a different family ID

Output:

Showerthought - can the external ID just be bonded onto the individual's entry in the PED file as an extra column? This could be ignored or parsed deliberately, instead of taking an extra lookup file - no external ID column, NO SALE

illusional commented 1 year ago

We could give the seqr service accounts access. Though we'll have to do it manually, and case by case specific datasets, ie acute care for test or something.

Technically they have access to the same level of genetic data anyway.

@lgruen, one other way we could achieve this is proxy requests through metamist and do the access checks there.

illusional commented 1 year ago

Hey @MattWellie, is there action required or pending from me? If not do you mind if I unassign myself from this, just cleaning up assigned cards.

MattWellie commented 1 year ago

@illusional sorry, nothing to do here. Happy for this to just sit as a prototype until the conversation moves with the Broad/MSFT side