voc / voctoweb

voctoweb – the frontend and backend software behind media.ccc.de
GNU General Public License v3.0
188 stars 55 forks source link

Search is currently broken for finding specific names or people (search is too smart in a bad way?) #739

Open ffeldner opened 7 months ago

ffeldner commented 7 months ago

Hi,

The media.ccc.de page has its talk authors hyperlinked. For example, on this talk: https://media.ccc.de/v/37c3-11782-smtp_smuggling_spoofing_e-mails_worldwide - the author is called Timo Longin, thus the listed speaker has a hyperlink to https://media.ccc.de/search?p=Timo+Longin However, the way the search works, it looks for both tokens seperately - so I get a waggonload of Timos and other talks.

Even when figuring out "hey, the smart solution is to search manually for the rarest token of the name, which is clearly the surname, Longin" this gets torpedoed by a "smart" search feature that I guess is there to filter out typos?

I get a waggonload of results for the query https://media.ccc.de/search/?q=longin because they contain the word login, I get a talk by a person called Longtin, so each search word seems to get completely taken apart and filled with single-character placeholders or sth.

This does not happen when changing the q to a p https://media.ccc.de/search/?p=longin and using only Longin, because apparently query search tries to be smart, while person search tries to be accurate. but it still would be beneficial to allow person search to search for a name containing spaces without splitting it up.

Also, using various parameters like p instead of q by manually rewriting the URL after searching for something is not documented or offered, so unless one knows this functionality of a person search exists, they will not think to do so when using the search field on media.ccc.de

Preferred fix would be to implement a way to search for an entire name, and then use that way for the hyperlinks on the speaker names on videos. bandaid fix would be to use the search/?p= personsearch with only surnames of speakers to narrow it down.

evilscientress commented 7 months ago

We just analyzed the issue a bit more and it seams like the issue lies with the elastic search query.

The query uses a multi_match of type best_fields which doesn't do phrase matching. So it splits the query term at white spaces. The query should rather use the type phrase which would type to match the full name, or a combination of both with phrase matching boosted.

It's debatable though if the search should try to split up the search term at white spaces at all, because it then will return other speakers that for example share the same first name, which is not what the user expects when clicking on a speaker name.

rofl0r commented 7 months ago

The query should rather use the type phrase which would type to match the full name, or a combination of both with phrase matching boosted.

sounds good. maybe this could be changed in this way for some testing ?

It's debatable though if the search should try to split up the search term at white spaces at all

if it is split, it should only return results where all terms match, not any.