yacy / yacy_search_server

Distributed Peer-to-Peer Web Search Engine and Intranet Search Appliance
http://yacy.net
Other
3.39k stars 426 forks source link

Include user ratings? #179

Open hurrgadev opened 6 years ago

hurrgadev commented 6 years ago

I am running a Yacy peer since a few months and thought about collecting user clicks and influence the sorting of the rating results shown in the frontend for each logged in user. We assume that a clicked page has a higher relevance at least for the community that is using this peer. Have any efforts been made in this direction already? First thoughts: We add some Javascript to catch the user click. Then store each rating with the user ID in a ratings database or simply extend the bookmarks DB and tag entires that have been generated with user clicks. When the user is searching, the ratings DB or bookmarks DB should be considered by post-ranking. I do have some java experience and would be willing to put some effort in improving search results making use of user ratings.

luccioman commented 6 years ago

Hi @hurrgadev ,

Have any efforts been made in this direction already?

Yes indeed something was done some time ago towards this direction, but was reverted (see commits 61ae9d2d1187459ceb695ebc465cd7bd12905f9d and 4eb89d7f152c6b54028b21205c2bf99a6eeb302f). Maybe @Orbiter or @reger24 could give you more details on their current position about this.

reger24 commented 6 years ago

Hi, from my point of view, a click counter (without user ID) for rating purpose would be benefitial. On the other hand see comment on https://github.com/yacy/yacy_search_server/commit/61ae9d2d1187459ceb695ebc465cd7bd12905f9d which is a other view and a clear position against (and reason for revert).

hurrgadev commented 6 years ago

Thank you for your answers. I can very well understand the reason for revert. However, a click counter without user ID will certainly produce better search results than none. But an individual click counter brings better results. In my case, our server is trusted. Apart from username, password and clicks, no other user data should be stored. I will take a closer look at the commits. Thank you for your support.

Scarfmonster commented 6 years ago

The problem of collecting data such as this is that there is no easy way to fight with abuse of them if they are purely anonymous. On one hand click counter may produce better results, but it also is open for abuse. There is no way to stop anyone from giving a link hundreds of “clicks” without logging anything. There is also no easy way to share this data between peers while making sure what you get is genuine. Currently YaCy checks remote results to see if searched words appear, but it can't do the same with click counts.

agnelvishal commented 5 years ago

The problem of collecting data such as this is that there is no easy way to fight with abuse of them if they are purely anonymous. On one hand click counter may produce better results, but it also is open for abuse. There is no way to stop anyone from giving a link hundreds of “clicks” without logging anything. There is also no easy way to share this data between peers while making sure what you get is genuine. Currently YaCy checks remote results to see if searched words appear, but it can't do the same with click counts.

Logarithm of click count can be used so that beyond a limit the influence of click count on rating is less. Also other metrics like backlinks can be used to validate if the click count is manipulated or not.