Closed rettinghaus closed 4 years ago
Thanks for the pointer, @rettinghaus . We will look into this ASAP. It might take some more time, though, as @fsteeg is on vacation until August.
We had a similar issue for NWBib to exclude some field from the "all fields" search, see https://github.com/hbz/nwbib/issues/110. Here is the PR with which @fsteeg solved this: https://github.com/hbz/lobid/pull/198
Maybe it makes sense to use a similar solution in this case.
The solution was to switch from the all
query to a specialized word
query, see https://github.com/hbz/nwbib/commit/330019483571ccc4450bca3db8a728fcb3d7cac6. I hope we can solve the issue in another way.
There are several ways, @acka47 plz choose:
"dateModified": { type: date}
instead of type: text
. This prevents a splitting of the date string at index level, e.g. 2016-05-26T17:19:48.000
- thus a query of 2016
wouldn't match. Note that this change would also make a query like _search?q=describedBy.dateModified:2016
result in zero hits (while this is working atm) - one would have to use the exact whole String. So is this a break?"dateModified": { "include_in_all" : false}
. This prevents taking the dateModified
into account when building the _all
index at index level.+1 for 1.) as long as range queries still work . I think, @hagbeck is the only one using this field for updating a search index. @hagbeck, do you have a problem with typing this as date
in the index profile?
We are currently using YYYYMMDD formatted dates in the field based search for dateModified.
Example:
For us it would be possible to change the code easily to
if necessary.
We are currently using YYYYMMDD formatted dates in the field based search for dateModified.
These should also work after the change, otherwise we should implement it in another way. @dr0i , please check if these queries will still work.
@acka47 no, these won't work anymore. So we go with the second approach, yes?
I got confused. As this issue is about lobid-gnd it won't affect @hagbeck. Sorry for the superfluous ping.
In lobid-gnd these kind of searches do not work while in lobid-resources dateModified
is already typed as date. This means that this change will be an improvement to the GND API. So please continue with it, @dr0i.
(As a side note, the date properties from GND ontology are all typed as keyword
, see https://github.com/hbz/lobid-gnd/blob/a9bba80a23e26a4c812964424b6c89457e4a3103/conf/index-settings.json#L63-L94. See this commit and related issue #149 for background: https://github.com/hbz/lobid-gnd/commit/15e93bd24c3491cea4e478f5de6fc478487804ca I think this is because not all values conform to date format.)
uups - they ARE working (forget to escape the query) .But those queries work in a rather unpredictable way, e.g. http://lobid.org/gnd/search?q=describedBy.dateModified%3A%3E30009
.
Also note that @hagbeck refers to lobid-resources
(which has uses an other date format as lobid-gnd
) so we are safe whatsoever.
those queries work in a rather unpredictable way, e.g.
http://lobid.org/gnd/search?q=describedBy.dateModified%3A%3E30009
.
Yes, I can not even query by a specific day and get only results for resources modified on that day. I just tried it out when checking whether updates work. E.g. https://lobid.org/gnd/search?q=describedBy.dateModified%3A2020-07-23&size=100&format=html does not give back entries for entries modified on 2020-07-23.
A phrase query doesn't give back any results at all: https://lobid.org/gnd/search?q=describedBy.dateModified%3A%222020-07-23%22
This is too bad and another reason to set this as data in the index profile.
Deployed to staging. As this new index is based on the new base dump from 2020-06-22 and updates were received to date #258 is fixed as a sideeffect, too.
Furthermore, https://github.com/hbz/lobid-gnd/issues/255 is resolved with this one. Drei auf einen Streich. Wow.
Note: not one of the two solutions in https://github.com/hbz/lobid-gnd/issues/257#issuecomment-664210752 solved the issue but all two of them.
As this issue is resolved and in production: closing.
Obviously the field
dcterms:modified
(resp.dateModified
) is indexed. Is this intentional? I find it rather confusing in searching, especially because this field isn't shown in the HTML output.Example: looking up "heinrich ida 2020" shows results that have nothing to do with "2020", except that they have been modified this year.
If there are no compelling reasons for this, I'd recommend removing the field from search.