hbz / lobid-gnd

UI and API to the Integrated Authority File (Gemeinsame Normdatei, GND)
http://lobid.org/gnd
Eclipse Public License 2.0
25 stars 5 forks source link

Use ElasticSearch validation endpoint to check queries before executing them #309

Closed b2m closed 2 years ago

b2m commented 2 years ago

Extracted issue from #304: use ElasticSearch validation endpoint instead of or additional to custom data cleaning methods.

ElasticSearch also has a validation endpoint to check queries before executing them. Not sure about the impact on performance but maybe this is an alternative approach to consider.

Originally posted by @b2m in https://github.com/hbz/lobid-gnd/issues/304#issuecomment-1055792505

That's a good point, thanks! I did a quick and dirty test and this might work. I'll need a bit time for proper implementation though, and I'm currently not sure when we'll have time for that.

Originally posted by @fsteeg in https://github.com/hbz/lobid-gnd/issues/304#issuecomment-1081817133

Also see #188, #190, #304 and #306.

fsteeg commented 2 years ago

Deployed to test:

Invalid queries as in #188 are still cleaned: {"query":"Deutscher Bibliothekartag (28. Bibliothekartag 1932 : 1932 : Jena)"}

Ranges as in #304 are still not cleaned: {"query":"Benedikt Papst","properties":[{"pid":"dateOfBirth","v":"[1920-01-01 TO 1950-01-01]"}]}

Other valid queries with special characters are now no longer cleaned: {"query":"+needle -jan"}

b2m commented 2 years ago

From a users perspective this is a very clean solution!

One possible feature would be to log and analyze failed queries to be able to discover and react on misusage, failed expectations or wrong usage on client side implementations. But I do not know whether you have the infrastructure for that.

acka47 commented 2 years ago

+1

One possible feature would be to log and analyze failed queries to be able to discover and react on misusage, failed expectations or wrong usage on client side implementations. But I do not know whether you have the infrastructure for that.

Thanks for suggesting this improvement! I think this should be no problem.Is this something, @dr0i would set up. However, if we can easily set it up, we should write out failed queries in an extra file. Or we could schedule a regular appointment to go through the whole logs and look at the failed queries...

fsteeg commented 2 years ago

Thanks for the feedback @b2m, I've added logging in b3b04c0 to allow us to track invalid queries in the future. Will assign @dr0i for code review in #315.