Use searchengine for slow queries

will-moore commented 2 months ago

There are a handful of queries that attempt to load a large amount of data on a large server such as IDR.

These can be slow and need to be cached.

One option is to use the https://github.com/ome/omero_search_engine to perform these queries.

Currently, the searchengine is accessed via web api only, (it's not installed in the omero-web python environment). We don't want to make omero-mapr dependent on the searchengine since this will be a breaking change for all the mapr users who don't have searchengine installed. So we need to test for the availability of searchengine.

First step is to list the API calls that are most problematic:

Counting values for a Key: /mapr/api/organism/count/?page=1&group=3 (e.g. used for root tree node at https://idr.openmicroscopy.org/mapr/organism/) possibly filtering by value: /mapr/api/organism/count/?value=Homo%20sapiens&page=1&group=3 (used by root tree node at https://idr.openmicroscopy.org/mapr/organism/?value=Homo%20sapiens). Both return similar response:
```
{
"experimenter": {
"id": -1,
"omeName": "Organism",
"firstName": "Organism",
"lastName": "",
"extra": {
"case_sensitive": false
},
"childCount": 72
}
}
```

Listing values for a Key: /mapr/api/organism/?case_sensitive=false&orphaned=true&experimenter_id=-1&page=1&group=3 possibly filtering by value: /mapr/api/organism/count/?value=Homo%20sapiens&page=1&group=3&_=1715773803746. Used for top-level nodes (children of root) at URLs above.

{
"maps": [
{
  "id": "Homo sapiens",
  "name": "Homo sapiens (13320907)",
  "ownerId": -1,
  "permsCss": "",
  "childCount": 120,
  "extra": {
    "counter": 13320907
  }
},
{
  "id": "Saccharomyces cerevisiae",
  "name": "Saccharomyces cerevisiae (302935)",
  "ownerId": -1,
  "permsCss": "",
  "childCount": 19,
  "extra": {
    "counter": 302935
  }
}
...
],
"screens": [],
"projects": []
}

jburel commented 2 months ago

As indicated previously. the searchengine will not work for the generic mapr

will-moore commented 2 months ago

First bullet-point above, we need to know "How many values are there for key: Organism"? ("childCount": 72) And we may want to filter by e.g. value=Homo%20sapiens which will likely return "childCount": 1.

For the 2nd bullet-point, we need to know "Give me all the values for key: Organism and for each value we also want the number of containers and the total number of images in those containers.

I'm not sure that current searchengine API has endpoints/queries that can supply these data? cc @khaledk2

khaledk2 commented 2 months ago

The following URL will return JSON which contains all the values in addition to the number of images in each bucket (value) https://idr.openmicroscopy.org/searchengine//api/v1/resources/image/searchvaluesusingkey/?key=Organism

This PR https://github.com/ome/omero_search_engine/pull/63 contains "in" operator. So, using the following query:

{
   "resource":"image",
   "query_details":{
      "and_filters":[
         {
            "name":"Organism",
            "operator":"in",
            "query_type":"keyvalue",
            "resource":"image",
            "value":"homo sapiens, scapania spitsbergensis, scapania compact, ....."
         }
      ],
      "case_sensitive":false,
      "or_filters":[

      ]
   }
}

with this URL: https://idr.openmicroscopy.org/searchengine//api/v1/resources/submitquery/containers/

will return the required data for the second point.

will-moore commented 1 month ago

@khaledk2 Thanks. Is https://github.com/ome/omero_search_engine/pull/63 deployed somewhere? I can see apidocs at https://idr.openmicroscopy.org/searchengine/searchengine/apidocs/ but it's not there at idr-testing or idr-next?

One issue I noticed above is that everything in searchengine is lowercase, so there's no Homo sapiens etc. Not sure how to work around that?

khaledk2 commented 1 month ago

@will-moore I have deployed the https://github.com/ome/omero_search_engine/pull/63 on the idr-testing.

The key/value pairs are case-sensitive, it is saved inside the elasticsearch indices as it is in the idr database. The user has the option to make the query case-sensitive or not.

For example, the following query will not return the result as the case-sensitive attribute is set to true with the value true. It will return the results if the attribute is set to false or set value to Homo sapiens

{
   "resource":"image",
   "query_details":{
      "and_filters":[
         {
            "name":"Organism",
            "operator":"in",
            "query_type":"keyvalue",
            "resource":"image",
            "value":"homo sapiens, scapania"
         }
      ],
      "case-sensitive": true,
      "or_filters":[

      ]
   }
}

The /searchvaluesusingkey returns the normalized values, we can modify it to return the actual values.

ome / omero-mapr

Use searchengine for slow queries #83