retrohub-org / retrohub

Retrogaming library frontend, built to be a highly customizable platform.
https://retrohub-org.github.io/
MIT License
110 stars 6 forks source link

Improve Scraper Matching System #368

Open WingofaGriffin opened 4 months ago

WingofaGriffin commented 4 months ago

I've noticed that the scraper is getting hung up on special characters in file names, most notably things in parenthesis, brackets, and braces. This leaves things such as versions and country codes in the scraping text, leading to fail to find matches.

Additionally, may I suggest adding integration with IGDB as a database? It tends to have much less rate limiting than ScreenScraper, and even includes some entries they don't have. Applications like Daijisho use it as their primary scraper.

rsubtil commented 4 months ago

Yes, there needs to be some cleanup system when using raw file names due to that extra garbage. I'm still not sure of a reliable way to do so, but I'll investigate options and detail them here in the future.

As for IGDB, I've considered using it too as it's way more reliable. However, the main issue is how their API works, which requires me to generate a Twitch token, which has an expiration time. Embedding this in the app would force me (and users) to update the app every ~2 months. And their suggestion of setting up a proxy service for requests is likely to incur in costs I can't have ATM.

I want to eventually support more scrapers in the future and IGDB is indeed a good candidate, but I'll need to figure out a good solution for this problem.