thewca / wca-live

Platform for running WCA competitions and sharing live results with the world
https://live.worldcubeassociation.org
70 stars 23 forks source link

Error 502 on API Query #143

Closed ClementValot closed 1 year ago

ClementValot commented 1 year ago

I've got a piece of software that uses got to query the graphQL api, that worked until recently when it started to receive 502 error codes

Here are the logged options of the request :

{
  "agent": {},
  "decompress": true,
  "timeout": {},
  "prefixUrl": "",
  "body": "blah-blah long body because it's graphql"
  "ignoreInvalidCookies": false,
  "context": {},
  "followRedirect": true,
  "maxRedirects": 10,
  "throwHttpErrors": true,
  "username": "",
  "password": "",
  "http2": false,
  "allowGetBody": false,
  "headers": {
    "user-agent": "got (https://github.com/sindresorhus/got)",
    "content-type": "application/json",
    "content-length": "1655",
    "accept": "application/json",
    "accept-encoding": "gzip, deflate, br"
  },
  "methodRewriting": false,
  "method": "POST",
  "cacheOptions": {},
  "https": {},
  "resolveBodyOnly": false,
  "isStream": false,
  "responseType": "json",
  "url": "https://live.worldcubeassociation.org/api",
  "pagination": {
    "countLimit": null,
    "backoff": 0,
    "requestLimit": 10000,
    "stackAllItems": false
  },
  "setHost": true,
  "enableUnixSockets": true
}

Thank you for your help

jonatanklosko commented 1 year ago

@ClementValot can you provide a script/snippet that reproduces it?

It works via cURL:

curl -X POST -H "Content-Type: application/json" -d '{ "query": "query { competitions { id } } "}' https://live.worldcubeassociation.org/api

and also with fetch:

fetch("https://live.worldcubeassociation.org/api", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ query: "query { competitions { id } }" })
})
  .then((response) => response.json())
  .then((data) => console.log(data));
ClementValot commented 1 year ago

Sure, here's a js snippet with a query that consistently 502's:

import got from "got";

const competitionQuery = `
    query Competition($id:ID!) {
        competition(id:$id) {
        name
            competitionEvents {
                id
                event {
                  id 
                  name
                }
                rounds {
                    active
                    id
                    name
                    number
                    open
                    results {
                        ranking
                        person {
                            id
                            name
                            wcaId
                            results {
                                advancing
                                attempts {
                                    result
                                }
                                average
                                averageRecordTag
                                best
                                ranking
                                round {
                                    name
                                    number
                                    competitionEvent {
                                        event {
                                            id
                                            name
                                        }
                                    }
                                }
                                singleRecordTag
                            }
                        }
                    }
                }
            }
        } 
    }
`;

const wcaLive = "https://live.worldcubeassociation.org";

const client = got.extend({
   prefixUrl : wcaLive,
  responseType: "json"
})

client.post("api", {json: {operationName: "Competition", query: competitionQuery, variables: {id: "2285"}}}).then((result)=> {console.log(result.body)})

Funny thing is, if I change operationName to something that isn't "Competition", the query goes through with 200 and correctly responds with an error, so it's not something that has to do with headers as I initially thought

jonatanklosko commented 1 year ago

@ClementValot I see, I think the issue is that the query is too complex and makes the server run out of memory. If you remove the nested results part it seems to work fine:

const competitionQuery = `
    query Competition($id:ID!) {
        competition(id:$id) {
        name
            competitionEvents {
                id
                event {
                  id
                  name
                }
                rounds {
                    active
                    id
                    name
                    number
                    open
                    results {
                        ranking
                        person {
                            id
                            name
                            wcaId
                        }
                    }
                }
            }
        }
    }
`;

That's an issue with GraphQL APIs, ideally we should add complexity analysis to prevent such queries to go through. For now please avoid so much levels of nested relations.

jonatanklosko commented 1 year ago

I added some basic rules in 535191fd2e8919629be3c3ac7f76520d8031a7c5 to prevent from too complex queries and now the query you posted returns a proper error.

ClementValot commented 1 year ago

It used to work properly before the infra changes :'(

Having to make several requests kinda defeats the purpose of GraphQL, doesn't it? 🤔

I'll rewrite, thanks for the help!

jonatanklosko commented 1 year ago

It used to work properly before the infra changes :'(

Interesting, the base instance has less memory now and the app scales horizontally. Though looking into logs it seems like the OOM is significant, so it may be related to runtime change (both OS and the language runtime), which would be surprising too. But either way, I think we should've been doing complexity analysis in the first place anyway, so thanks for opening the issue.

Having to make several requests kinda defeats the purpose of GraphQL, doesn't it? 🤔

Not necessarily, handling a request is one thing, but underneath it requires several database queries and memory allocation, so that needs to be capped too. Imagine a typical GraphQL API that returns a long list of entries, it would usually be paginated, so to get many entries you need to query multiple pages one by one. Similarly, allowing arbitrarily nested graphql queries may just be too resource-heavy, and that also allows a bad actor to easily crash the app.

Also note that technically speaking the main purpose of the API is for the WCA Live client itself, so as long as it can successfully make its queries it works as expected. While it is allowed for other clients to call the API, it's not optimised for such purpose and it's not expected to be relied on heavily. For most cases we people should use WCIF instead, and query the WCA Live API only if they actually need access to results as soon as entered (rather than synchronized).

ClementValot commented 1 year ago

It is for live commentary purposes, I print cheat sheets about every finalist once per competition as soon as the finals round is open, and tried to optimize by minimizing the number of queries, but I'll clean that up.

Thank you!

jonatanklosko commented 1 year ago

@ClementValot actually I think you can still do it with a single query. Instead of querying for competitor data on every result (which leads to a bunch of duplicate objects), you could query for just competitor ids and extend the query with competitor data:

query Competition($id:ID!) {
  competition(id:$id) {
    name
    competitionEvents {
      id
      event {
        id
        name
      }
      rounds {
        active
        id
        name
        number
        open
        results {
          ranking
          person {
            id
          }
        }
      }
    }
    competitors {
      id
      name
      wcaId
      results {
        advancing
        attempts {
          result
        }
        average
        averageRecordTag
        best
        ranking
        round {
          name
          number
          competitionEvent {
            event {
              id
              name
            }
          }
        }
        singleRecordTag
      }
    }
  }
}

Then you lookup competitor info in that list :)