thehonker / eevee

yes another irc bot yes i'm rewriting it but keeping the same name hush hush
3 stars 1 forks source link

Image results seem weird #9

Open itsrachelfish opened 3 years ago

itsrachelfish commented 3 years ago
22:38 <@rachel> ~im ice titan
22:38 < eevee> Ice titan - The RuneScape Wiki | https://arkids.net/image/cta-box/dinosaur.png

The title says that the image is from the runescape wiki, which would presumably be this: https://runescape.wiki/w/Ice_titan (Source page) https://runescape.wiki/images/0/09/Ice_titan.png?d4b75 (Image URL)

However the URL returned is from the ARK IDs database????

itsrachelfish commented 3 years ago

After looking at the debug logs with weazzy on IRC it looks like these strange results are just being caused by the library itself and nothing in the code that we can change.

However, there was something else I noticed in the code: All of the results displayed are randomly chosen - https://github.com/epers/eevee/blob/master/modules/search.mjs#L217

Search engines are designed to display the most relevant result first, so we should always display the first result for a new search phrase. Once a search phrase has already been searched for it does make sense to start dislpaying a random result.

This means we should keep track of what search phrases have been used recently and switch behavior from "first" to "random" if they've been used within the past X amount of time. 5 minutes? 1 hour? 24 hours? I dunno, but something

thehonker commented 2 years ago

This means we should keep track of what search phrases have been used recently and switch behavior from "first" to "random" if they've been used within the past X amount of time. 5 minutes? 1 hour? 24 hours? I dunno, but something

Implemented in https://github.com/epers/eevee/commit/859b5398a4b1fadabda0f6f048ea963e0f402b1c and https://github.com/epers/eevee/commit/9c64543f74c97b6b9994b2c59c2c6ebfea81208b , isn't persistent across restarts. It's an array with a configurable upper bound on size, things fall out as new things are pushed in.


Library issue appears to be when there is no title supplied, it picks another one (at random or next result down?) It is scraping html of google image results afaik, maybe we can fork that lib and make it return empty string in that case or generate one properly from mime type. This is somewhat reproducible in that in browser the first result is for that ark.fandom.com site. Not sure where the runescape image is even coming from?

[Array] (15): [
    0 [Object] (4): {
        url (76): https://runescape.wiki/images/thumb/Ice_titan.png/1200px-Ice_titan.png?d4b75
        width (4): 1751
        height (4): 1200
        origin [Object] (2): {
            title (47): Ice Titan - Official ARK: Survival Evolved Wiki
            website (37): https://ark.fandom.com/wiki/Ice_Titan
        }
    }
  ... 
]