opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

studiesAndLeadVariantsForGene endpoint returns empty response #2791

Closed DSuveges closed 1 year ago

DSuveges commented 2 years ago

Description of the bug

Our users reported on the community portal that the studiesAndLeadVariantsForGene GraphQL endpoint in the genetics portal returns no data. Whatever gene ids we are requesting in the following graphQL request, the response is always empty.

Returned data:

{
  "data": {
    "studiesAndLeadVariantsForGene": []
  }
}

Interestingly the same request returns correct dataset in the dev instance.

Expected behaviour The query should return something like this:

{
  "data": {
    "studiesAndLeadVariantsForGene": [
      {
        "indexVariant": {
          "id": "9_2622134_C_T"
        },
        "study": {
          "source": "FINNGEN",
          "pmid": null,
          "traitReported": "\"Type 2 diabetes, definitions combined\"",
          "pubDate": "2022-01-24",
          "pubTitle": null,
          "pubAuthor": "FINNGEN_R6",
          "hasSumstats": true,
          "numAssocLoci": 79
        }
      },
      {
        "indexVariant": {
          "id": "9_2622278_G_A"
        },
        "study": {
          "source": "GCST",
          "pmid": "PMID:27863252",
          "traitReported": "Platelet count",
          "pubDate": "2016-11-17",
          "pubTitle": "The Allelic Landscape of Human Blood Cell Trait Variation and Links to Common Complex Disease.",
          "pubAuthor": "Astle WJ",
          "hasSumstats": true,
          "numAssocLoci": 325
        }
      },
DSuveges commented 2 years ago

As reported by a user, another query shows a similar behaviour:

query study {
  studiesForGene(geneId:"ENSG00000169174") {
    study {
      source
      pmid
      pubDate
      pubJournal
      pubTitle
      pubAuthor
      hasSumstats
      nInitial
      nReplication
      nCases
      traitCategory
      numAssocLoci
    }
  }
  }

Most likely the problem has a common underlying issue. Once this issue is fixed, both endpoint will work just fine.

JarrodBaker commented 2 years ago

It isn't really clear what is going on. Each CH instance is given it's own complete copy of the database on start-up. The instances running in the US and in our development environments return the expected results. The instance in the EU contains errors.

They are all using the same disk image ch-disk-jldgdj65-image-v2 to start the database. When I ssh into the EU node and go into the database I can see:

SELECT
    name,
    code,
    value,
    last_error_message
FROM system.errors
WHERE value > 0
ORDER BY code ASC

Query id: b15dc9a7-c090-4cff-9a49-5a6aa19c45cc

┌─name──────────────┬─code─┬─value─┬─last_error_message──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ CANNOT_OPEN_FILE  │   76 │     1 │ Cannot open certificate file: /etc/clickhouse-server/server.crt.                                                                                                │
│ FILE_DOESNT_EXIST │  107 │     1 │ Cannot open file /var/lib/clickhouse/store/e9d/e9d00a82-974a-4cbb-b378-ba608f972522/all_4951_5496_4/lead_pos.bin, errno: 2, strerror: No such file or directory │
└───────────────────┴──────┴───────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

The FILE_DOESNT_EXIST error is what is causing the problem in the API.

Compared with the US:

SELECT
    name,
    code,
    value,
    last_error_message
FROM system.errors
WHERE value > 0
ORDER BY code ASC

Query id: ec88c3c2-f56b-426b-a389-70868ac584b4

┌─name─────────────┬─code─┬─value─┬─last_error_message───────────────────────────────────────────────┐
│ CANNOT_OPEN_FILE │   76 │     1 │ Cannot open certificate file: /etc/clickhouse-server/server.crt. │
└──────────────────┴──────┴───────┴──────────────────────────────────────────────────────────────────┘

When the instance starts up it mounts the disk and then starts Clickhouse in a Docker container. My best guess is that while it is mounting the data something goes wrong. I just can't see why it would effect some instances and not others.

DSuveges commented 2 years ago

If the data got somehow corrupted, wouldn't we expect missing data/tables from the UI? However https://genetics.opentargets.org/gene/ENSG00000169174 and https://genetics.dev.opentargets.xyz/gene/ENSG00000169174 show the same table with same values?

(assuming studiesForGene and studiesAndLeadVariantsForGene fills tables on the gene page)

JarrodBaker commented 2 years ago

The assumption is wrong (as far as I can tell) :smile: .

That page appears to be using the geneInfo and colocalisationsForGene endpoints. I'm not actually sure where the endpoint is used on the FE (if at all). There are enough 'artifacts' in the genetics portal that I'm never surprised to find dead-ends.

DSuveges commented 2 years ago

As @d0choa pointed out these endpoints are not used anywhere in the UI, they are only listed in the API documentation as examples.

JarrodBaker commented 2 years ago

I'm glad we spent the day fixing that then...somewhat deflating.

buniello commented 1 year ago

Yesterday, another user reported on Community re problem with the studyandLeadVariantsForGene API example query. As mentioned in one of the comments above, this example query works on the dev instance but not in production.

Discussed with Daniel S. As an easy fix, we suggest we remove this example query from the Genetics API doc page. The other example queries work as expected. @chinmehta can you help removing the example query please, as you have implemented the page originally? Please do let me know if you have any Qs on this.

d0choa commented 1 year ago

We still have an issue in the API, but we will ignore it from now as it's not used in production