ome / omero-mapr

An OMERO.web app allowing to browse the data through attributes linked to the image
https://pypi.org/project/omero-mapr/
GNU Affero General Public License v3.0
5 stars 12 forks source link

Use searchengine for slow queries #83

Open will-moore opened 2 months ago

will-moore commented 2 months ago

There are a handful of queries that attempt to load a large amount of data on a large server such as IDR.

These can be slow and need to be cached.

One option is to use the https://github.com/ome/omero_search_engine to perform these queries.

Currently, the searchengine is accessed via web api only, (it's not installed in the omero-web python environment). We don't want to make omero-mapr dependent on the searchengine since this will be a breaking change for all the mapr users who don't have searchengine installed. So we need to test for the availability of searchengine.

First step is to list the API calls that are most problematic:

jburel commented 2 months ago

As indicated previously. the searchengine will not work for the generic mapr

will-moore commented 2 months ago

First bullet-point above, we need to know "How many values are there for key: Organism"? ("childCount": 72) And we may want to filter by e.g. value=Homo%20sapiens which will likely return "childCount": 1.

For the 2nd bullet-point, we need to know "Give me all the values for key: Organism and for each value we also want the number of containers and the total number of images in those containers.

I'm not sure that current searchengine API has endpoints/queries that can supply these data? cc @khaledk2

khaledk2 commented 2 months ago

The following URL will return JSON which contains all the values in addition to the number of images in each bucket (value) https://idr.openmicroscopy.org/searchengine//api/v1/resources/image/searchvaluesusingkey/?key=Organism

This PR https://github.com/ome/omero_search_engine/pull/63 contains "in" operator. So, using the following query:

{
   "resource":"image",
   "query_details":{
      "and_filters":[
         {
            "name":"Organism",
            "operator":"in",
            "query_type":"keyvalue",
            "resource":"image",
            "value":"homo sapiens, scapania spitsbergensis, scapania compact, ....."
         }
      ],
      "case_sensitive":false,
      "or_filters":[

      ]
   }
}

with this URL: https://idr.openmicroscopy.org/searchengine//api/v1/resources/submitquery/containers/

will return the required data for the second point.

will-moore commented 1 month ago

@khaledk2 Thanks. Is https://github.com/ome/omero_search_engine/pull/63 deployed somewhere? I can see apidocs at https://idr.openmicroscopy.org/searchengine/searchengine/apidocs/ but it's not there at idr-testing or idr-next?

One issue I noticed above is that everything in searchengine is lowercase, so there's no Homo sapiens etc. Not sure how to work around that?

khaledk2 commented 1 month ago

@will-moore I have deployed the https://github.com/ome/omero_search_engine/pull/63 on the idr-testing.

The key/value pairs are case-sensitive, it is saved inside the elasticsearch indices as it is in the idr database. The user has the option to make the query case-sensitive or not.

For example, the following query will not return the result as the case-sensitive attribute is set to true with the value true. It will return the results if the attribute is set to false or set value to Homo sapiens

{
   "resource":"image",
   "query_details":{
      "and_filters":[
         {
            "name":"Organism",
            "operator":"in",
            "query_type":"keyvalue",
            "resource":"image",
            "value":"homo sapiens, scapania"
         }
      ],
      "case-sensitive": true,
      "or_filters":[

      ]
   }
}

The /searchvaluesusingkey returns the normalized values, we can modify it to return the actual values.