ncbo / ncbo_cron

Jobs that run on a regular basis in the NCBO infrastructure
Other
2 stars 6 forks source link

OBO Foundry synchronization script failing with JSON::ParserError #78

Closed jvendetti closed 4 months ago

jvendetti commented 4 months ago

The obofoundry_sync script is erring out before running to completion with a JSON::ParserError: unexpected token error.

The script makes a call to GitHub's GraphQL API to get the contents of the OBO Foundry's ontology registry, located here:

https://github.com/OBOFoundry/OBOFoundry.github.io/blob/master/registry/ontologies.jsonld

This is the GraphQL query:

query {
  repository(name: "OBOFoundry.github.io", owner: "OBOFoundry") {
    object(expression: "master:registry/ontologies.jsonld") {
        ... on Blob {
        text
      }
    }
  }
}

When this script was originally written, the above query returned the entire contents of the ontology registry file. It turns out that now it doesn't due to limitations on response size. As can be seen in the documentation for Blob objects, there's an isTruncated property to indicate if the contents are truncated. If you issue the above query and ask for this property in the return, the value is true, indicating that the file contents are truncated. This is the underlying issue that's causing the JSON parsing error, i.e., despite a 200 OK status code from the API, we don't get the full contents of the file back.

Here's a screenshot of the query executed in GitHub's GraphQL Explorer, showing the truncated data and the isTruncated property:

Screenshot 2024-04-16 at 12 03 56 PM

The script will need to be modified is some way to fetch the complete contents of the registry file.

cmungall commented 4 months ago

You can also just use purls like http://purl.obolibrary.org/meta/ontologies.jsonld - no need for github API!