populationgenomics / metamist

Sample level metadata system
MIT License
1 stars 1 forks source link

Seqr Layer - Update saved variants API call response code discrepancy #832

Open EddieLF opened 3 weeks ago

EddieLF commented 3 weeks ago

The seqr layer is used to sync the Seqr projects with Metamist.

During the sync, we make a call to update_saved_variants seqr API endpoint. The request is a POST request, however it's really an empty request to tell Seqr to update the saved variants for the project.

We should expect Metamist to receive the response from Seqr as 200 - OK in the case of success. However Metamist frequently reports a 502 error regardless:

raise ClientResponseError(
  aiohttp.client_exceptions.ClientResponseError: 502, 
  message='Bad Gateway', 
  url=URL('https://seqr.x.org.au/api/project/sa/project_guid/saved_variant/update')

Some examples - with links to the Seqr VM log entries that show this discrepency:

Note this error is transient and doesn't always happen. https://github.com/populationgenomics/metamist/pull/772 introduced the @backoff module for this API call, which will try to get a successful response up to 3 times. Raising this backoff limit might help reduce the number of errors reported, however the time taken to sync each seqr project will increase exponentially. It would be better to find out why Metamist is getting a 502 response despite Seqr saying 200.

nevoodoo commented 15 hours ago

Hey @EddieLF where can I find the logs from the seqr side for this? I tried looking at Cloud Run but it doesn't seem like any requests are being logged. I'd be keen to take a closer look at exact entries where seqr reports a 200.

EddieLF commented 15 hours ago

Hey @nevoodoo, thanks for looking into this. The logs are in the compute engine instances, if you go to the Slack posts I linked above, I've replied to them with links to the specific entries in the GCE logs that shows the seqr VM responding 200