project-lux / lux-marklogic

Code, issues, and resources related to LUX MarkLogic
Other
3 stars 2 forks source link

#283 - Exclude Collection Sets from the stats for Works #360

Closed clarkepeterf closed 1 month ago

brent-hartwig commented 1 month ago

@clarkepeterf, I think you're getting some false positives because the element-word-positions indexing option is not enabled on the lux-content database.

cts.jsonPropertyScopeQuery documentation excerpt: "Enabling both the word position and element word position indexes will speed up query performance for many queries that use cts:json-property-scope-query. The position indexes enable MarkLogic Server to eliminate many false-positive results, which can reduce disk I/O and processing, thereby speeding the performance of many queries. The amount of benefit will vary depending on your data."

Query:

'use strict';
import { IDENTIFIERS } from '/lib/identifierConstants.mjs';

const dataTypeQuery = cts.jsonPropertyValueQuery('dataType', 'Set', ['exact'])
const idOnlyQuery = cts.jsonPropertyValueQuery('id', IDENTIFIERS.collection)
const idAndDataTypeQuery = cts.andQuery([
  idOnlyQuery,
  dataTypeQuery
])
const currentQuery = cts.andQuery([
  dataTypeQuery,
  cts.notQuery(
    cts.jsonPropertyScopeQuery(
      'json',
      cts.jsonPropertyScopeQuery(
        'classified_as',
        cts.jsonPropertyScopeQuery(
          'equivalent',
          idOnlyQuery
        )
      )
    )
  ),
])

const estimates = {
  current: cts.estimate(currentQuery),
  idOnly: cts.estimate(idOnlyQuery),
  idAndDataType: cts.estimate(idAndDataTypeQuery)
}
estimates

Results from Blue:

{
  "current":305289,
  "idOnly":57144,
  "idAndDataType":64
}

I'm going to check in some changes that includes the above. We can back out if I've strayed.

brent-hartwig commented 1 month ago

@clarkepeterf, changes are available for your review and testing.

clarkepeterf commented 1 month ago

@brent-hartwig I like the use of removeItemByValueFromArray and using IDENTIFIERS.collection

However, without the jsonPropertyScopeQuery I'm not getting the expected results.

Here is the expected results for lux-dev-data-ypm-endpoint-consumer (using jsonPropertyScopeQuery)

{
    "estimates": {
        "searchScopes": {
            "agent": 22809,
            "concept": 200004,
            "event": 0,
            "item": 1486337,
            "place": 140425,
            "reference": 363184,
            "set": 15,
            "work": 0
        }
    },
    "metadata": {
        "timestamp": "2024-10-24T17:44:04.796",
        "milliseconds": 27
    }
}

Here are the results for lux-dev-data-ypm-endpoint-consumer without jsonPropertyScopeQuery:

{
    "estimates": {
        "searchScopes": {
            "agent": 22809,
            "concept": 200004,
            "event": 0,
            "item": 1486337,
            "place": 140425,
            "reference": 363184,
            "set": 15,
            "work": 15
        }
    },
    "metadata": {
        "timestamp": "2024-10-24T17:48:09.893",
        "milliseconds": 15
    }
}

I think I will start using removeItemByValueFromArray and using IDENTIFIERS.collection, but continue to use jsonPropertyScopeQuery

clarkepeterf commented 1 month ago

@brent-hartwig I'm realizing your update needs a not query - let me try updating that

brent-hartwig commented 1 month ago

@clarkepeterf, that may be because I didn't put a cts.notQuery around cts.jsonPropertyValueQuery('id', IDENTIFIERS.collection) 😱

brent-hartwig commented 1 month ago

Calling jinx despite the two minute diff!

clarkepeterf commented 1 month ago

Yep, it just needed a cts.notQuery - that gets it right!

I'm going to merge this.