sselph / scraper

A scraper for EmulationStation written in Go using hashing
MIT License
449 stars 88 forks source link

thegamedb API has changed? #230

Open pgiblock opened 6 years ago

pgiblock commented 6 years ago

Not sure, but I am a first time user of this scraper. I ran into the issue "It appears that thegamesdb.net isn't up". Looking at the code, it appears that the scraper attempts to GET http://thegamesdb.net/api/GetGame.php?id=1. After following the 302, a 404 is returned. From https://api.gamesdb.net , it appears the API has changed? Looks like one now needs to hit https://api.thegamesdb.net/Games/ByGameID?id=1&apikey=<API_KEY>.

Is this a recent change on gdb's side? Are there any plans to support the new API? I'm going to modify the code locally and hardcode an API Key temporarily and report back. Hopefully the endpoint paths (and addition of an API Key) is all that changed, and the scraper's parser can remain as-is.

sselph commented 6 years ago

Thanks for the heads-up. It looks like it has changed. I'll convert the code over this weekend.

On Sat, Jul 14, 2018, 2:46 PM Paul Giblock notifications@github.com wrote:

Not sure, but I am a first time user of this scraper. I ran into the issue "It appears that thegamesdb.net isn't up". Looking at the code, it appears that the scraper attempts to GET http://thegamesdb.net/api/GetGame.php?id=1. After following the 302, a 404 is returned. From https://api.gamesdb.net , it appears the API has changed? Looks like one now needs to hit https://api.thegamesdb.net/Games/ByGameID?id=1&apikey=.

Is this a recent change on gdb's side? Are there any plans to support the new API?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sselph/scraper/issues/230, or mute the thread https://github.com/notifications/unsubscribe-auth/AHwNVvmHz8t4SV0puzBSuZJOSf-MecQIks5uGjyhgaJpZM4VP9cK .

sselph commented 6 years ago

Actually I may have to remove support for this service. The apiKey they mention is for the dev and is limited to something like 1000 queries per month. I think they designed the API quota for people running a web server or something that mirrors the data and not a scraper like mine.

Ideally they would reconsider and allow users to generate an API key and they'd use their own individual quota. The shared quota for an app like mine makes no sense.

pgiblock commented 6 years ago

I'm working on it now... If it is minor, then expect a pull request later today.

Edit: Blarg... just read your recent comment. This stinks as I feel their metadata is superior. Guess I'll try the 'ss' source and see if that gives me the data I want. Either that, or leverage one of the mirrors they are trying to protect against ;-)

sselph commented 6 years ago

Ah nm looks like I misread. The new documentation is not very good. So the limit seems like it might be per IP so that would be roughly a single user. I would just need to batch the API calls some.

At the moment there is supposedly a legacy subdomain you can add to the url to get it working again until the code has been migrated.

pgiblock commented 6 years ago

Yeah. Batching sounds ideal to get the query count down. I haven't dug into the guts of the scraper enough to know how painful of a refactor that would be.

pgiblock commented 6 years ago

Good news: It seems that simply replacing 'thegamedb.net' with 'legacy.thegamedb.net' is a usable stop-gap solution.

sselph commented 6 years ago

Nice.

Yeah the code today is my first Go code so not great to start and over the years has grown to become even less elegant. It does something roughly like the following so not laid out for batch processing in a single database.

for each rom found
  for each DB:
     if result:
       break
     else:
       continue

It was more designed to try multiple databases to fill in gaps from that were missing. A refactor would probably need to do something like

for each DB:
  for each batch of unscraped roms:
    get results(batch)
pgiblock commented 6 years ago

Yeah, that makes sense, where unscraped roms is initially the full set. Then for each iteration of DB, it is only the set of unresolved roms from the previous iteration. gdb might have some limit on the number of ids allowed in a single query, so some chunking might be in order as well.

Zer0xFF commented 6 years ago

Hi there,

I'm currently maintaining TheGamesDB new site and API and would like to give you a quick update in that regard, the new API (and site) is a complete overhaul with nothing but the database from the old site, as such it won't be a simple url change, the new api return is now json with changed field names and data layout. if you've any questions feel free to tag me here or on the forum.

Regards Zer0xFF

sselph commented 6 years ago

Thanks. Once I get an API key, I'll start working on it more seriously but if you have documentation of the response formats I can go ahead and have most of it ready. I'll start looking at refactoring the code to make batching a little easier since the new API seems to encourage that.

Zer0xFF commented 6 years ago

im afraid thats not available yet, as there are still few more things to implement, and they take priority over documentation.

and we hope that keys will be reissued by next weekend.

symbios24 commented 6 years ago

Hi, upon the change of the api it finds very little game images per system eg for nes in 400 roms it finds 200 for gameboy in 250 roms it finds 100 is this going to be fixed?

symbios24 commented 6 years ago

after updating the scraper the xml files has the same address : thegamesdb.net instead of the legacy.thegamesdb.net is this normal?

Zer0xFF commented 6 years ago

@symbios24 the legacy subdomain is the old site with only the domain change, so results returned shouldn't be any difference.

I changed any references I was able to find, but there is the possibility I missed some, which endpoint is still returning thegamesdb.net?

symbios24 commented 6 years ago

so far i tried the gameboy/nes/atari 2600 games and they have the thegamesdb.net to the xml

symbios24 commented 6 years ago

also atari 5200 is still returning thegamesdb.net i assume all the atari systems do the same

sselph commented 6 years ago

Thanks for the report. I may have forgotten to fix a url somewhere. I'll also see if there was some change affecting images. I would expect them all to work or to not work so it seems weird that it is hit or miss.

On Mon, Jul 23, 2018 at 8:45 AM symbios24 notifications@github.com wrote:

also atari 5200 is still returning thegamesdb.net i assume all the atari systems do the same

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/sselph/scraper/issues/230#issuecomment-407045371, or mute the thread https://github.com/notifications/unsubscribe-auth/AHwNVgxenne2LqkhqTnIpAhL6pCoYrj3ks5uJcVXgaJpZM4VP9cK .

symbios24 commented 6 years ago

if you can change the scraper for pbp - psx files to download images/pictures based on the name of the game and not on the extension of the filename will be great.

melroy89 commented 6 years ago

It will require that this project (scraper) will request a API key, see this post.

So you can use the new API, eg: https://api.thegamesdb.net/#/Games/GamesByGameName