muldjord / skyscraper

Powerful and versatile game scraper written in c++
GNU General Public License v3.0
487 stars 128 forks source link

Try to help scrapers clear confusion with Arabic vs Roman numerals #77

Closed CookiePLMonster closed 5 years ago

CookiePLMonster commented 5 years ago

I admit I don't know if this is within Skyscraper's area of expertise or should scrapers be blamed for this fully, but I noticed all scrapers I tested (except IGDB, I don't have an account here yet) failed to find a rom with this name:

Double Dragon 2 (U) [!].gb

Quick check on MobyGames (https://www.mobygames.com/search/quick?q=Double+Dragon+2) reveals the game is known as Double Dragon II, not 2. Indeed, renaming the file resolves the issue.

Can Skyscraper help scrapers clear confusions like this? I imagine it'd be similar for other games (GTA4/GTA IV for example, even though it's not something Skyscraper is likely to scrape yet).

muldjord commented 5 years ago

Skyscraper does more than most scrapers when it comes to numerals. I actually do numeral checking when comparing results. So if a game is returned with "Game Name II" but the file is "Game Name 2" it will still match (but only if the online database API provides the result) because Skyscraper knows that 2 and II is the same number.

In the case of your mobygames example, it seems that their website search actually knows to find "Double Dragon II" but the API I use for the site (their official API) doesn't. I just tested it, and it give no results for "Double Dragon 2".

Basically my stance on this is that it is the job of the online databases to provide a better search engine which knows that II and 2 are alike.

CookiePLMonster commented 5 years ago

Can then I hint Skyscraper to override the name so it checks under XYZ II and not XYZ 2?

muldjord commented 5 years ago

I'm working on that right now. So yes, I'll probably make it do so unless I run into issues.

CookiePLMonster commented 5 years ago

Magnificent! I already found a second use case for such hinting, as I noticed there's at least one more game which had its title wrongly matched by scrapers. Will be useful for sure.

Preferably in a form of a XML file so it can be persistent, just like a localdb is.

muldjord commented 5 years ago

This feature is now done and tested with "Double Dragon 2 (U) [!].gb" and it works. It now finds "Double Dragon II" when using mobygames.

This will be in the upcoming 2.9.0 release.

muldjord commented 5 years ago

Closing this. Feature is implemented and will be in 2.9.0. Thank you for suggesting it.