project-lux / lux-marklogic

Code, issues, and resources related to LUX MarkLogic
Other
3 stars 2 forks source link

Create An Option For a LUX API Consumer to Enable Search Filtering Inside cts.values, cts.triples, etc. (from 1094) #41

Open gigamorph opened 7 months ago

gigamorph commented 7 months ago

Problem Description

Search options (punctuation-sensitive, etc.) are not honored when using some fields. This is because ML performs "unfiltered" searches for the "query" parameter in many of the search functions we use in our ML search code (cts.values, cts.triples, etc.)

Expected Behavior

Allow users of LUX API to pass in an option to perform filtered searches

Prerequisite For

https://github.com/project-lux/lux-frontend/issues/48

Technical Info

See https://git.yale.edu/lux-its/marklogic/issues/916#issuecomment-21081 for info on how to perform filtered search where the default is to be unfiltered


This is a follow up to https://git.yale.edu/lux-its/marklogic/issues/916 Old tix: https://git.yale.edu/lux-its/marklogic/issues/1094

NOTE: pending on OPTIC decision (2/2024)

brent-hartwig commented 6 months ago

@jffcamp and @prowns, the implementation will impose a performance penalty. The penalty with vary by search. For some, this may lead to:

I recommend we manually test performance and, if we decide to proceed, work some of these searches into the scripted performance test.

jffcamp commented 6 months ago

Thanks for the heads up Brent. I agree with manually testing performance prior to doing the work. Hopefully confirming if the penalty is widespread or limited to edge cases. Based on that we may want to continue engagement with engineering. Also good to work into the performance test.

brent-hartwig commented 6 months ago

@jffcamp and @prowns,

@clarkepeterf or I could work on this ticket. It is marked as critical. After picking one of us, please prioritize relative to our other assignments.

I believe we will be to identify two types of searches and then assess the performance penalty via QC:

brent-hartwig commented 4 months ago

I had the opportunity to run an approach by ML engineers today: for any CTS query parameter to a values function, a) perform a filtered search, b) construct a cts.documentQuery out of the results, and c) provide the cts.documentQuery as the CTS query parameter value. Consensus was there was probably a better way, and that it would involve cts.contains.

roamye commented 3 months ago

@brent-hartwig - want to confirm this is no longer blocked by the OPTIC decision. Is that correct?

Should this be brought up in next team meeting to confirm we want to move forward with the using cts.contains? or is this something you, @clarkepeterf and @jffcamp @prowns can discuss?

brent-hartwig commented 3 months ago

@roamye, it has never been outright blocked by the Optic/CTS decision. It's just didn't want to implement in CTS if we were going to imminently switch to Optic. The decision to make the switch has neither been made nor is imminent. Thus, if this feature is sufficient desired, it may be pursued now.

@jffcamp, @prowns, and @clarkepeterf: I'd like to reiterate the recommendation to compare the performance between the current filtered search and the additional "intra-filtering", potentially just from Query Console (depending on the level of effort to implement).

clarkepeterf commented 3 months ago

@brent-hartwig same question as the other ticket - do you have any updates on the status of the CTS/Optic Comparison or ML 11.3? Also - per your previous comment - I'm not sure how we would use cts.contains here - I'm curious to hear more about using that vs. the current approach

@jffcamp simiilar question here as https://github.com/project-lux/lux-marklogic/issues/10 - do we want to wait for the CTS/Optic comparison or ML 11.3 (Optic optimization)?

@prowns @roamye

brent-hartwig commented 3 months ago

[!IMPORTANT] We may not want to pursue this ticket.

@clarkepeterf, @jffcamp, @prowns, and @roamye,

I found an example compelling this ticket but it has to do with punctuation:

{
  "_scope":"item",
  "producedBy":{
    "name":"O?keeffe",
    "_options":[
      "punctuation-sensitive",
      "unwildcarded"
    ]
  }
}

The search returns the same number of results regardless of treating ? as a wildcard or not. And so, yes, the approach this ticket is after would make that distinction but I'd like us to first question is that distinction important enough to LUX users. If so, does that rule Optic out? Optic only returns unfiltered results. The filtering process targets criteria that cannot be resolved via indexes, which includes punctuation. This directly overlaps with one of the CTS/Optic batch no. 3 focal points whereby we intend to compile a complete list of characteristics not indexed by MarkLogic and thus only resolved via filtering --if any of those are LUX requirements, then we may want to pursue this ticket and ~abandon our Optic ambitions~ consider filtering Optic results. If those are not LUX requirements, then we should really look for another example before implementing this ticket. And should we be able to come up with such an example, we should see if an indexing or code change can correct the unfiltered results.

roamye commented 3 months ago

From UAT meeting 6/26: it is agreed to not pursue this ticket until we know more information on the optic.

brent-hartwig commented 1 month ago

223 is related in the sense that this ticket's implementation (if pursued) should honor the resolved value of the filterResults parameter of the associated endpoint. Three endpoints implement it. A default is specified by one of three properties. The endpoint consumer can override the default.

roamye commented 1 month ago

@brent-hartwig - what does it mean for this issue (as it is related to #233) that #233 was closed and is no longer being pursued? Or, does it mean #233 SHOULD continue to be pursued as it relates in this way to this issue?

brent-hartwig commented 1 month ago

@roamye, I believe you intended to reference #223 (unfiltered search results). Presuming as much, the intent of my comment was to convey that if #223's implementation remains in place then this ticket's implementation should honor the active request's filtering behavior. For example, if the endpoint consumer specifies not to filter the search results, this ticket's implementation should not force filtering within any values functions (e.g., cts.triples) that are part of the search.

Top-level filtering is controlled by #223. Lower-level filtering would be controlled by this ticket's implementation. The lower-level implementation needs to honor the top-level's filtering option.