Open brent-hartwig opened 1 month ago
@brent-hartwig I was just testing the middle tier for this release and realized I need to reenable a triple - setClassifiedAs will be used again - it was temporarily taken out of the codebase due to #337
It will be reverted in this PR, which I'm planning to merge shortly: https://github.com/project-lux/lux-marklogic/pull/358
The other triples are worth looking into. I do think @azaroth42 and @kkdavis14 probably know if we should be using any of these unused triples
The following are also used in the full text search pattern:
Good point. The check predicates script is based on search term configuration. It could be extended to include the predicates associated to each search scope, for keyword search.
there's a few places in the code that builds the triples that repeats a pattern with 1. the LUX predicate and 2. the CIDOC predicate (lux:about_or_depicts and crm:p129_is_about is one example). I am not sure why this is done. Also unsure of the use of about/depicts when it doesn't have a Class specific type along with it.
https://lux.collections.yale.edu/ns/about_or_depicts_activity, https://lux.collections.yale.edu/ns/about_or_depicts_period, https://lux.collections.yale.edu/ns/about_or_depicts_set
These would be interesting to add as search terms.
http://www.cidoc-crm.org/cidoc-crm/P106i_forms_part_of
this is newly added and should I think get a search term. It would only be Works as part of other Works (while HMOs part of other HMOs & VisualItems part of other Works is valid modeling, I don't think anyone is using it this way in LUX).
linguisticobjectInfluencedCreation
is a bug on our end, should be Work. (PR: https://github.com/project-lux/data-pipeline/pull/152)
So, I've broken down the triples into categories here -
As discussed above, the following are used in full text search:
"https://lux.collections.yale.edu/ns/agentAny",
"https://lux.collections.yale.edu/ns/conceptAny",
"https://lux.collections.yale.edu/ns/eventAny",
"https://lux.collections.yale.edu/ns/itemAny",
"https://lux.collections.yale.edu/ns/placeAny",
"https://lux.collections.yale.edu/ns/referenceAny",
"https://lux.collections.yale.edu/ns/setAny",
"https://lux.collections.yale.edu/ns/workAny",
The following about_*
or depicts_*
would be superseded by the about_or_depicts_*
:
"https://lux.collections.yale.edu/ns/about_activity",
"https://lux.collections.yale.edu/ns/about_agent",
"https://lux.collections.yale.edu/ns/about_concept",
"https://lux.collections.yale.edu/ns/about_object",
"https://lux.collections.yale.edu/ns/about_period",
"https://lux.collections.yale.edu/ns/about_place",
"https://lux.collections.yale.edu/ns/about_set",
"https://lux.collections.yale.edu/ns/about_work",
"https://lux.collections.yale.edu/ns/depicts_agent",
"https://lux.collections.yale.edu/ns/depicts_concept",
"https://lux.collections.yale.edu/ns/depicts_place",
"https://lux.collections.yale.edu/ns/depicts_work",
The following are less likely to be useful for search terms because they lack either a start or end scope, or both.:
"https://lux.collections.yale.edu/ns/about_or_depicts",
"https://lux.collections.yale.edu/ns/any",
The following are external, and per @kkdavis14's comment above, are often duplicated by LUX triples.:
"http://www.cidoc-crm.org/cidoc-crm/P106i_forms_part_of",
"http://www.cidoc-crm.org/cidoc-crm/P128_carries",
"http://www.cidoc-crm.org/cidoc-crm/P129_is_about",
"http://www.cidoc-crm.org/cidoc-crm/P138_represents",
"http://www.cidoc-crm.org/cidoc-crm/P2_has_type",
"http://www.cidoc-crm.org/cidoc-crm/P65_shows_visual_item",
"http://www.w3.org/1999/02/22-rdf-syntax-ns#type",
"https://linked.art/ns/terms/digitally_carries",
"https://linked.art/ns/terms/digitally_shows",
"https://linked.art/ns/terms/equivalent",
These are the remaining LUX triples:
"https://lux.collections.yale.edu/ns/about_or_depicts_activity",
"https://lux.collections.yale.edu/ns/about_or_depicts_period",
"https://lux.collections.yale.edu/ns/about_or_depicts_set",
"https://lux.collections.yale.edu/ns/agentInfluencedBeginning",
"https://lux.collections.yale.edu/ns/linguisticobjectInfluencedCreation",
"https://lux.collections.yale.edu/ns/setClassifiedAs",
"https://lux.collections.yale.edu/ns/workLanguage"
Should any of the remaining LUX triples be used? Per @kkdavis14 https://lux.collections.yale.edu/ns/linguisticobjectInfluencedCreation
should be https://lux.collections.yale.edu/ns/workInfluencedCreation
- we don't have a search term for this either. Should there be one?
Also per @kkdavis14 - http://www.cidoc-crm.org/cidoc-crm/P106i_forms_part_of
could be a useful triple. Should we follow the pattern of what we've done elsewhere and make a LUX triple equivalent of this? Or is the external triple what we should use? There is also precedent for using external triples, for example we use crm("P72_has_language")
to search for a Work's language.
thanks Peter that's helpful.
I don't understand how workLanguage isn't used as a search term, when it's available in advanced search. I guess you are using, as you say, the crm predicate.
re: Influenced triples: X Influenced Y could be any of the following X, Y values: a. X: concept, agent, activity, work, object b. Y: Production, Creation, Beginning, Ending, Publication, Encounter, Activity c. agentInfluencedProduction no doubt exists and has a search term, but of the others, perhaps only agentInfluencedBeginning and workInfluencedCreation* exist in the data ATMO to have made it onto this list. The first is the formation of some Group was influenced by a Person, and the second is an LO influenced the creation of some other LO or VI (probably LO). I think if we have search term for agentInfluencedProduction, it's useful to have it for these others as well.
I do not know why there's duplicate LUX/CRM triples. I asked here https://github.com/project-lux/data-pipeline/issues/151.
For forms_part_of
specifically, we only create the CRM triple (we aren't currently creating a LUX equivalent).
*bug wasn't creating these properly, see PR
@clarkepeterf, FYI, the original version of comparePredicates.js.txt used SELECT ?p WHERE { ?s ?p ?o }
to compile the list of predicates. It took 14 minutes to run. 2.5 minutes can be taken off by using group by to keep more of the work on the d-nodes: select ?p { ?s ?p ?o } group by ?p
@clarkepeterf, latest development on comparing configured predicates to the dataset's predicates is in PR https://github.com/project-lux/lux-marklogic/pull/371:
I collected the list of predicates in the 2024-10-19 dataset and compared to the triples referenced in Backend v1.27.0's configurations. All configured predicates exist in the dataset 🎉. The intent of this ticket is to question if triples that are not presently in use could be configured in the backend to the benefit of users. A quick look by @azaroth42 and others may determine if this is worth pursuing or closing.
Findings
Script
comparePredicates.js.txt