Open gigamorph opened 4 months ago
From old tix:
Example Queries: Top left red bar Top right red bar Top left green bar Top right green bar Middle green bar Bottom green bar
Expected Behavior: When applying a facet, there should always be results. In this case it should include the referenced record, rather than no results.
@jffcamp and @prowns, the expected behavior is not always achievable at scale. Facets are calculated on unfiltered results --results determined via indexes alone. We can investigate specific instances but need to reset expectations.
Investigating specific instances may surface a bug or means to index the data better.
The cited record is an event that started in Feb 2017 and ended in Jun 2017:
I am able to reproduce the issue by performing a Simple Search of events matching "Small-Great Objects: Anni and Josef Albers in the Americas, Yale University Art Gallery, New Haven" then using the date facet.
When I repeat as an Advanced Search, I am able to get the search result when start date is >=
2017 and/or end date is <=
2017. I cannot get the search result using the other operators, even ones that should. For instance, start date >=
2000 returns the 2017 result but start date >
2017 does not (yet should).
I suspect a bug in the data search pattern.
Revised on 20 Mar 24.
The following screenshot illustrates a search that may not return everything it should. At least though release1.12, this search will only return Objects that have a produced_by
or created_by
timespan where the start of the range is before Jan 1, 2023 and the end of the range is after Dec 31, 2023. I confirmed the first search result's values for the associated fields meet this criteria: 1988 and 2989, respectively. This appears to only cover the "middle green bar" portion of the overlapping date range.
cts.andQuery([
cts.jsonPropertyValueQuery(
'dataType',
['DigitalObject', 'HumanMadeObject'],
['exact']
),
cts.andQuery([
cts.fieldRangeQuery('itemProductionEndDateFloat', '>', 1704067199, [], 1),
cts.fieldRangeQuery('itemProductionStartDateFloat', '<', 1672531200, [], 1),
]),
]);
@azaroth42 and @kkdavis14, for the example in the description, we have a disconnect between the query and data:
Frontend and backend interpretations:
{"startDate":"2017","_comp":">="}, {"startDate":"2017","_comp":"<="}
when the facet value is selected.From record https://lux.collections.yale.edu/data/activity/9ed8ad50-f405-4cf9-9bb7-180048eeb078:
It's not clear to me what should change. What comes to your mind?
I think it should be:
if there's no eventInitiatedEndDateFloat
, does eventInitiatedStartDateFloat
/seconds_since_epoch_begin_of_the_begin
fall within the query.
I don't know how eventInitiatedEndDateFloat
is configured but it needs changed--because in this case there really isn't one, it shouldn't be pulling from end_of_the_end
.
We don't have an end_of_the_begin
here, and it's not required in Linked.art. If there needs to be an eventInitiatedEndDateFloat
/seconds_since_epoch_end_of_the_begin
where do we compute that?
in short, the query is about the start, and the second date we're using here is the end. with that said, perhaps we need to add a query for end date to events faceting.
@kkdavis14, LUX presently supports a single date-related search pattern. It is geared towards utilizing the start and end of a timespan where the operator is the deciding factor. This aligns with @azaroth42's argument/requirement that we should always deal in timespan ranges versus a single point in time. That said, I believe:
A meeting may be the best way to proceed --after I have time to investigate additional date-related issue reports. My next comment will speak to another instance of excluding an item due to operating against two points in time vs. one.
sure thing. I am sure I do not understand the whole scope of the issue, but I know Rob is tied up with LUX for science right now so I took a crack at it. I agree with should deal in timespan ranges--if it's the correct timespan, which it doesn't look like we're doing right now.
@kkdavis14, you shared a search with me whereby the results were missing a result you expected. I modified the search to reduce the number of search results but it would include your search result had it not failed the end date portion of the search criteria.
The search is for objects containing "coffeepot" in the (primary) name that were created between 1730 and 1800. This yields 11 results and excludes https://lux.collections.yale.edu/data/object/000de659-fce8-4b8d-9b4a-c9bb3c97bc61, which was created sometime between 1790 and 1840. Because this object could have been created between 1790 and 1800, I believe this object should have been included in the search results. Unlike the previous example, the data has values for both indexes the query used. However, it required the end of the range to be 1800 or earlier, which would exclude this record as the end of its range is 1840. Something appears amiss with the "overlapping" portion of the overlapping date range search pattern. I will investigate and potentially follow-up via separate ticket given there is enough difference between the two examples.
This is a fun one. Below is my current understanding, some which may contradict previous statements.
agentStartDate
. This is confusing as there was only one date search pattern after old #471.https://lux.collections.yale.edu/data/object/000de659-fce8-4b8d-9b4a-c9bb3c97bc61
https://lux.collections.yale.edu/data/object/92a6e57e-b0bb-4429-927f-e49fba870475
/json[type= ('Activity', 'Period')]/timespan
.The following is informed by conversations with and input from Rob, Kelly, and Peter.
/json[type= ('Activity', 'Period')]/timespan/
, collapse (and rename?) the following property pairs to support a single timespan:
begin_of_the_begin
and _seconds_since_epoch_begin_of_the_begin
begin_of_the_end
and _seconds_since_epoch_begin_of_the_end
end_of_the_begin
and _seconds_since_epoch_end_of_the_begin
end_of_the_end
and _seconds_since_epoch_end_of_the_end
eventInitiatedStartDateStr
eventInitiatedStartDateFloat
eventInitiatedEndDateFloat
eventCompletedStartDateStr
eventCompletedStartDateFloat
eventCompletedEndDateFloat
eventEndDate
facet configuration and update the eventStartDate
’s index name to match that of the new and only event date string index.TBD
=
operator for overlapping.!=
for not overlapping.Example of requiring items encountered between 1600 and 1610, inclusive:
{
"_scope":"item",
"_comp":"=",
"encounteredDate":"1600;1610"
}
Before pursing all of the above changes, we should test the direction we're heading:
Teams thread checking in with Rob and Kelly on the above-proposed timespan data changes.
From the teams thread link above, an issue was listed: https://github.com/project-lux/data-pipeline/issues/22. This issue is now closed. Is there anything else needed to move this forward with the proposed changes listed above?
cc: @prowns
Is there anything else needed to move this forward with the proposed changes listed above?
@roamye, if inclusive of the Next Steps suggested at the bottom of the same comment the Proposed Changes are in, then no. But I would advise against just diving into this ticket's implementation. I think we need a means to comprehensively address issues with date searches and be able to use that means in the future for regression testing. I'm open to what that means is, including bringing back automated unit tests.
Problem Description: In some situations (unknown as to the scenarios in which it's not working), the date facets when applied generate zero results.
Example: A search that includes this record https://lux.collections.yale.edu/view/activity/9ed8ad50-f405-4cf9-9bb7-180048eeb078 has a date of 2017-2017. Applying that generates a search that has no results.
Expected Behavior: When applying a facet, there should always be results. In this case it should include the referenced record, rather than no results.
Link: Note that clicking back from this state results in the facets not being updated. This is lux-web#1325 (closed)
**Dependency/Blocked
Link: https://lux.collections.yale.edu/view/results/events?q=%7B%22name%22%3A%22small-great%22%7D
Reference: https://git.yale.edu/lux-its/marklogic/issues/471#issuecomment-14626 included diagrams and modelling Old tix: https://git.yale.edu/lux-its/marklogic/issues/1032
Screenshot:![image](https://github.com/project-lux/lux-marklogic/assets/160149818/ae0dc52e-cc42-421f-bf68-6eecfca5129d)
Reference: https://git.yale.edu/lux-its/marklogic/issues/471#issuecomment-14626 included diagrams and modelling
Links: