project-lux / lux-marklogic

Code, issues, and resources related to LUX MarkLogic
Other
3 stars 2 forks source link

Research: Advanced Search failing to generate expected search results #131

Open roamye opened 1 month ago

roamye commented 1 month ago

Problem Description: In advanced search, a search query was done where one of the expected results should be "Trumbull & Wiley". Instead of the expected results the query generated the "Your search yielded no results. Please try another search." message. If we change the query to go down to the departmental level the search results work. However, the results should also work at the non-departmental level.

This ticket serves as a research ticket to understand why this is happening and how we can fix it.

Expected Behavior/Solution: TBD - research

Requirements: TBD - research

Needed for promotion: If an item on the list is not needed, it should be crossed off but not removed.

UAT/LUX Examples:

Dependencies/Blocks:

Related Github Issues:

Related links:

Wireframe/Mockup: Place wireframe/mockup for the proposed solution at end of ticket.

brent-hartwig commented 1 month ago

@roamye and @clarkepeterf,

Two responses:

  1. Was this criteria created using the advanced search form? As shown below, the IDs are missing the leading portion: https://lux.collections.yale.edu/data/.
{
  "AND":[
    {
      "produced":{
        "memberOf":{
          "curatedBy":{
            "memberOf":{
              "id":"group/0e8bc04a-6538-4792-b82a-9e0751857e7d"
            }
          }
        }
      }
    },
    {
      "produced":{
        "memberOf":{
          "curatedBy":{
            "memberOf":{
              "id":"group/6086b58d-941d-41e3-87c1-e00e96952ffb"
            }
          }
        }
      }
    }
  ]
}
  1. Once corrected, the search and facet requests timeout after 20 seconds. Looks like a pretty broad (and odd?) search: agents that produced objects that are part of a set whereby the set is curated by a group that is a member of YCBA and YUAG. Are any sets co-curated? Or perhaps I'm misinterpreting the query.

Here's my first re-write off the query. Near-instance response of zero results:

{
  "produced":{
    "memberOf":{
      "curatedBy":{
        "memberOf":{
          "AND":[
            {
              "id":"https://lux.collections.yale.edu/data/group/0e8bc04a-6538-4792-b82a-9e0751857e7d"
            },
            {
              "id":"https://lux.collections.yale.edu/data/group/6086b58d-941d-41e3-87c1-e00e96952ffb"
            }
          ]
        }
      }
    }
  }
}

However, when I take the AND up a level, I quickly get the first 10 of an estimated 5,217 results:

{
  "produced":{
    "memberOf":{
      "curatedBy":{
        "AND":[
          {
            "memberOf":{
              "id":"https://lux.collections.yale.edu/data/group/0e8bc04a-6538-4792-b82a-9e0751857e7d"
            }
          },
          {
            "memberOf":{
              "id":"https://lux.collections.yale.edu/data/group/0e8bc04a-6538-4792-b82a-9e0751857e7d"
            }
          }
        ]
      }
    }
  }
}

So maybe the search makes sense after all and we identified another optimization: push AND and OR down to the extent the triple paths / search terms match.

roamye commented 1 month ago

@brent-hartwig

  1. Was this criteria created using the advanced search form? A: I believe so. Sarah was this created in the AS form or was this from a link in LUX?

cc: @prowns

brent-hartwig commented 1 month ago

It probably was and the user just didn't include the full URI/IRI. It was probably one of us! If this was commonplace, the UI could provide a visual cue when the provided ID doesn't resolve and/or the backend could replace the associated CTS query containing an invalid ID with cts.falseQuery, thereby avoiding searching the triples store for something that will never exist.