nipreps / mriqcwebapi

Apache License 2.0
7 stars 13 forks source link

Feature request for complex query support #48

Open dkp opened 3 years ago

dkp commented 3 years ago

I was hoping to pull data from the database with specific qualities (TR, MB, manufacturer etc.). I tried the following URL: https://mriqc.nimh.nih.gov/api/v1/bold?max_results=10&where=bids_meta.MultibandAccelerationFactor%3C4&RepetitionTime=2.0&Manufacturer=Siemens&MagneticFieldStrength=3

Although this runs, the results are not filtered by the BIDS meta data. Am I missing something about how to do this? Is this a planned feature? It would be very useful.

Thank you.

oesteban commented 3 years ago

pinging @rwblair - should this work already? I'm guessing so right?

rwblair commented 3 years ago

@dkp is there an object id that multiband acceleration factor is 4 or greater that is returned by the query you posted? Playing with just that part of the query and switching it from a less than to a greater than seems to return appropriate results:

curl 'https://mriqc.nimh.nih.gov/api/v1/bold?max_results=10&where=bids_meta.MultibandAccelerationFactor%3E4'

Please let me know if I'm misunderstanding the issue.

dkp commented 3 years ago

@rwblair Thanks so much for the prompt reply!

Is the URL syntax documented anywhere?

I'm afraid I am new to navigating your undocumented web API and I'm finding it very confusing. For example, your query string does work for me but uses where=, which I have not seen in any other of the few example URLs that I have been able to find. I have now found several other query strings which work without where=. For example:

curl -s 'https://mriqc.nimh.nih.gov/api/v1/bold?max_results=59&RepetitionTime=2.5' -H "accept: application/json"

Also note that this query does not require a bids_meta. prefix on RepetitionTime, whereas your query seems to require it on MultibandAccelerationFactor or it fails. So when and where do I need to use bids_meta.?

When and why do I need to use where=? AND, if I use the where=, then how can I ask for a MultibandAccelerationFactor of exactly N (e.g. 3, or 4, or 8)? If I use another equal sign, this causes a server 400 error (which makes sense since it violates REST convention by saying where=key=value). The following does not work, for example:

curl -s 'https://mriqc.nimh.nih.gov/api/v1/bold?max_results=10&where=bids_meta.MultibandAccelerationFactor%3D4' -H "Accept: application/json"

I also note that there is apparently an undocumented limit of 50 records we can retrieve ; (

 curl -s 'https://mriqc.nimh.nih.gov/api/v1/bold?max_results=33' -H "accept: application/json" | json_pp | grep '"_id" :' | wc -l
      33

curl -s 'https://mriqc.nimh.nih.gov/api/v1/bold?max_results=51' -H "accept: application/json" | json_pp | grep '"_id" :' | wc -l
      50
rwblair commented 3 years ago

So we use software called Eve to automatically generate the API from a schema. More information on how it handles these query parameters can be found under the filtering section on page 16 of this PDF: https://readthedocs.org/projects/eve/downloads/pdf/latest/

The 'where' comes from mongodb style queries. I can't find good information on how exactly parameters outside of the where clause get converted into queries against the database, but I was not able to get querying on a nested field to work. With regards to "where=key=value" there are two ways that a query can assert equality, one is the mongodb style which is verbose and I'll omit for not, and the other is pythonic in nature where it expects a double equals sign to assert equality like ?where=key==value

Here is where the schema we use is defined if you want to get an idea of its structure: https://github.com/poldracklab/mriqcwebapi/blob/master/dockereve-master/eve-app/settings.py

This shows which entries can be inside the nested bids_meta structure (but none are required): https://github.com/poldracklab/mriqcwebapi/blob/8d4660591ec7e35fcb3edf417a8027f0d8b3ca8a/dockereve-master/eve-app/settings.py#L4

This is schema is the root object for the bold endpoints: https://github.com/poldracklab/mriqcwebapi/blob/8d4660591ec7e35fcb3edf417a8027f0d8b3ca8a/dockereve-master/eve-app/settings.py#L107

My suspicion is that the RepetitionTime part of the queries is being ignored without the bids_meta prefix in a where clause.

Returning to your original query if we do it in their "pythonic" style we get this where clause before url encoding:

?where=bids_meta.MultibandAccelerationFactor<4 and bids_meta.RepetitionTime==2.0 and bids_meta.Manufacturer=="Siemens" and bids_meta.MagneticFieldStrength

with encoding:

curl -g https://mriqc.nimh.nih.gov/api/v1/bold?where=bids_meta.MultibandAccelerationFactor%3C4%20and%20bids_meta.RepetitionTime%3D%3D2.0%20and%20bids_meta.Manufacturer%3D%3D%22Siemens%22%20and%20bids_meta.MagneticFieldStrength%3D%3D3

Which seems to work coming back with 1024 results, much fewer than querying on any one of the terms.

Finally, apologies on the 50 limit not being documented, not sure why we set it that way or if it was a default. Are you able to paginate over the results ok?