responsible-ai-collaborative / aiid

The AI Incident Database seeks to identify, define, and catalog artificial intelligence incidents.
https://incidentdatabase.ai
Other
167 stars 35 forks source link

GraphQL Endpoint: Pagination Disabled #2474

Open billsponsor opened 9 months ago

billsponsor commented 9 months ago

Hello - my name is Christian and I'm on the data team at CSET attempting to use the GraphQL API endpoint for AIID analysis.

Issue Description As I attempt to gather mass information for incidents (namely ids and plain_text of associated news reports), I run into the resource limit at ~90 incidents. Attempts to use the prescribed pagination techniques (to get all incidents) result in an error.

Documentation HasuraCloud GraphQL Docs - how HasuraCloud suggests paginating.

offset not working.

Screenshot 2023-12-06 at 3 10 44 PM

Suggested Fix It seems these pagination techniques have to be first implemented within the database creation schema to then be used within queries. StackOverflow answer highlighting a similar issue

billsponsor commented 9 months ago

Reading through some other issues, it seems MongoDB (if that's the database implementation) might have alternative documentation for paginating. In that case, I would change this issue for any suggestions on how to paginate with the current API implementations. Thanks!

kepae commented 9 months ago

It looks like the offset operator isn't defined in our schema because we used MongoDB Atlas -- and MongoDB uses a limit and skip paradigm. So, we are just left with limit on our API preview.

That said, I don't think Atlas is hitting a resource limit. Our code often sets the limit in queries to an arbitrarily high number, e.g. 9999, which might be the frustrating "trick" here.

This request works in the Hasura preview of our endpoint: link

query MyQuery {
  incidents(limit: 9999) {
    _id
    incident_id
    title
    date
    description
  }
}

Does that work for your client? Are you using Hasura to execute all of your queries, or something else?

billsponsor commented 9 months ago

The inclusion of news reports' plain text is what hits the resource limit (given the size and scale of coverage). So an arbitrarily high cap does not fix the issue!
As far as use environment, I test in Hasura but implement in a python script using a GraphQL-Python library.