Open pgiblock opened 6 years ago
Thanks for the heads-up. It looks like it has changed. I'll convert the code over this weekend.
On Sat, Jul 14, 2018, 2:46 PM Paul Giblock notifications@github.com wrote:
Not sure, but I am a first time user of this scraper. I ran into the issue "It appears that thegamesdb.net isn't up". Looking at the code, it appears that the scraper attempts to GET http://thegamesdb.net/api/GetGame.php?id=1. After following the 302, a 404 is returned. From https://api.gamesdb.net , it appears the API has changed? Looks like one now needs to hit https://api.thegamesdb.net/Games/ByGameID?id=1&apikey=
. Is this a recent change on gdb's side? Are there any plans to support the new API?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sselph/scraper/issues/230, or mute the thread https://github.com/notifications/unsubscribe-auth/AHwNVvmHz8t4SV0puzBSuZJOSf-MecQIks5uGjyhgaJpZM4VP9cK .
Actually I may have to remove support for this service. The apiKey they mention is for the dev and is limited to something like 1000 queries per month. I think they designed the API quota for people running a web server or something that mirrors the data and not a scraper like mine.
Ideally they would reconsider and allow users to generate an API key and they'd use their own individual quota. The shared quota for an app like mine makes no sense.
I'm working on it now... If it is minor, then expect a pull request later today.
Edit: Blarg... just read your recent comment. This stinks as I feel their metadata is superior. Guess I'll try the 'ss' source and see if that gives me the data I want. Either that, or leverage one of the mirrors they are trying to protect against ;-)
Ah nm looks like I misread. The new documentation is not very good. So the limit seems like it might be per IP so that would be roughly a single user. I would just need to batch the API calls some.
At the moment there is supposedly a legacy subdomain you can add to the url to get it working again until the code has been migrated.
Yeah. Batching sounds ideal to get the query count down. I haven't dug into the guts of the scraper enough to know how painful of a refactor that would be.
Good news: It seems that simply replacing 'thegamedb.net' with 'legacy.thegamedb.net' is a usable stop-gap solution.
Nice.
Yeah the code today is my first Go code so not great to start and over the years has grown to become even less elegant. It does something roughly like the following so not laid out for batch processing in a single database.
for each rom found
for each DB:
if result:
break
else:
continue
It was more designed to try multiple databases to fill in gaps from that were missing. A refactor would probably need to do something like
for each DB:
for each batch of unscraped roms:
get results(batch)
Yeah, that makes sense, where unscraped roms
is initially the full set. Then for each iteration of DB, it is only the set of unresolved roms from the previous iteration. gdb might have some limit on the number of ids allowed in a single query, so some chunking might be in order as well.
Hi there,
I'm currently maintaining TheGamesDB new site and API and would like to give you a quick update in that regard, the new API (and site) is a complete overhaul with nothing but the database from the old site, as such it won't be a simple url change, the new api return is now json with changed field names and data layout. if you've any questions feel free to tag me here or on the forum.
Regards Zer0xFF
Thanks. Once I get an API key, I'll start working on it more seriously but if you have documentation of the response formats I can go ahead and have most of it ready. I'll start looking at refactoring the code to make batching a little easier since the new API seems to encourage that.
im afraid thats not available yet, as there are still few more things to implement, and they take priority over documentation.
and we hope that keys will be reissued by next weekend.
Hi, upon the change of the api it finds very little game images per system eg for nes in 400 roms it finds 200 for gameboy in 250 roms it finds 100 is this going to be fixed?
after updating the scraper the xml files has the same address : thegamesdb.net instead of the legacy.thegamesdb.net is this normal?
@symbios24 the legacy subdomain is the old site with only the domain change, so results returned shouldn't be any difference.
I changed any references I was able to find, but there is the possibility I missed some, which endpoint is still returning thegamesdb.net?
so far i tried the gameboy/nes/atari 2600 games and they have the thegamesdb.net to the xml
also atari 5200 is still returning thegamesdb.net i assume all the atari systems do the same
Thanks for the report. I may have forgotten to fix a url somewhere. I'll also see if there was some change affecting images. I would expect them all to work or to not work so it seems weird that it is hit or miss.
On Mon, Jul 23, 2018 at 8:45 AM symbios24 notifications@github.com wrote:
also atari 5200 is still returning thegamesdb.net i assume all the atari systems do the same
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/sselph/scraper/issues/230#issuecomment-407045371, or mute the thread https://github.com/notifications/unsubscribe-auth/AHwNVgxenne2LqkhqTnIpAhL6pCoYrj3ks5uJcVXgaJpZM4VP9cK .
if you can change the scraper for pbp - psx files to download images/pictures based on the name of the game and not on the extension of the filename will be great.
It will require that this project (scraper) will request a API key, see this post.
So you can use the new API, eg: https://api.thegamesdb.net/#/Games/GamesByGameName
Not sure, but I am a first time user of this scraper. I ran into the issue "It appears that thegamesdb.net isn't up". Looking at the code, it appears that the scraper attempts to GET
http://thegamesdb.net/api/GetGame.php?id=1
. After following the 302, a 404 is returned. From https://api.gamesdb.net , it appears the API has changed? Looks like one now needs to hithttps://api.thegamesdb.net/Games/ByGameID?id=1&apikey=<API_KEY>
.Is this a recent change on gdb's side? Are there any plans to support the new API? I'm going to modify the code locally and hardcode an API Key temporarily and report back. Hopefully the endpoint paths (and addition of an API Key) is all that changed, and the scraper's parser can remain as-is.