muldjord / skyscraper

Powerful and versatile game scraper written in c++
GNU General Public License v3.0
478 stars 127 forks source link

Can't scrape some roms #288

Closed palmerj closed 3 years ago

palmerj commented 3 years ago

Describe the bug When trying to scrape for some roms, the game is not found screenscraper when the database does have the game record. The rom names are in the screenscraper database, however the checksums do not seems to be in the database. However, the documentation seems to state that if the rom name is exact is should still find the game.

$ sha1sum * f4c778443c96ca25c0afcb9cf3613995b6f7db6d maglord.zip 6bbbce094422062bd178d6007bed06dcdd0d8b78 pgm.zip c26339266ea77cf168be8d3c88dabcbfe741b434 umk3p.zip c71eeb4e70894dd2b24b0ccac65855ddb82fd7d3 wh1.zip 3f821f91f8cade285018ab3a1aae201a8986e165 wpksoc.zip

See output

Skyscraper -p arcade -i /path/to/roms -s screenscraper --flags unattend,skipped,videos -u xxxx:xxxx

Returns:

------------------------------------------
Running Skyscraper v3.6.6 by Lars Muldjord
------------------------------------------
Platform:           'arcade'
Scraping module:    'screenscraper'
Input folder:       '/xxx/rom_test'
Game list folder:   '/xxx/rom_test'
Covers folder:      '/xxx/covers'
Screenshots folder: '/xxxscreenshots'
Wheels folder:      '/xxx/media/wheels'
Marquees folder:    '/xxx/media/marquees'
Videos folder:      '/xxx/media/videos'
Cache folder:       'cache/arcade'

DID YOU KNOW: You can turn off these hints using the '--flags nohints' command line flag.

Fetching limits for user 'xxx', just a sec...
Setting threads to 1 as allowed for the supplied user credentials.

Reading and parsing quick id xml, please wait... Done!
Reading and parsing resource cache, please wait... Done!
Successfully parsed 27786 resources!

Looking for optional 'priorities.xml' file in cache folder... Found!
Priorities loaded successfully!

Starting scraping run on 5 files using 1 threads.
Sit back, relax and let me do the work! :)

#1/5 (T1) Pass 1 ---- Game 'maglord' not found :( ----

#1/5, (0/1)
Elapsed time   : 00:00:01
Est. time left : 00:00:07

#2/5 (T1) Pass 1 ---- Game 'pgm' not found :( ----

'screenscraper' requests remaining: 19019

#2/5, (0/2)
Elapsed time   : 00:00:02
Est. time left : 00:00:04

Request timed out, server is probably busy / overloaded...
Retrying request...

#3/5 (T1) Pass 1 ---- Game 'umk3p' not found :( ----

'screenscraper' requests remaining: 19019

#3/5, (0/3)
Elapsed time   : 00:01:05
Est. time left : 00:00:43

#4/5 (T1) Pass 1 ---- Game 'wh1' not found :( ----

'screenscraper' requests remaining: 19019

#4/5, (0/4)
Elapsed time   : 00:01:06
Est. time left : 00:00:16

#5/5 (T1) Pass 1 ---- Game 'wpksoc' not found :( ----

'screenscraper' requests remaining: 19016

#5/5, (0/5)
Elapsed time   : 00:01:08
Est. time left : 00:00:00

---- Resource gathering run completed! YAY! ----
Writing quick id xml, please wait... Done!
Writing 27786 (0 new) resources to cache, please wait... Done!

---- And here are some neat stats :) ----
Total completion time: 00:01:08

Total number of games: 5
Successfully processed games: 0
Skipped games: 5 (Filenames saved to '/home/USER/.skyscraper/skipped-arcade-screenscraper.txt')
muldjord commented 3 years ago

Hi, that's unfortunately not something I control on my end. Perhaps they changed how their API works internally. It used to work with just the filename matching. Perhaps they changed it. My API call is as they have requested it to be. If that doesn't give back a result there's not much I can do about it. So it's probably more a question for them I think.

muldjord commented 3 years ago

Looking a bit further at your output, I think this might simply be because it is the arcade platform. That platform is all over the place since it's made up of about 40 subplatforms. And for that reason alone, ScreenScraper won't provide a result, since it finds it in several of those subplatforms. One of the rules inside their API, for a filename match, is that it can only exist in the platform that is requested. But since arcade has many subplatforms, this breaks that. So you will probably notice that this will work for platforms such as nes or megadrive since they have no subplatforms.

Bottom line: This won't work with the arcade platform. It will probably still work for other platforms.

palmerj commented 3 years ago

Ok many thank. Maybe improve the docs to mention this?

What are the workarounds for now? Get a zip file that matches the checksum?

muldjord commented 3 years ago

I might document it, thanks for the suggestion.

The workaround is either to get checksum matching roms, or to use a custom query as described here. Look at the examples with -s screenscraper using md5 (or change that to sha1 if you please).

muldjord commented 3 years ago

Now documented here.

palmerj commented 3 years ago

Now documented here.

Thank you very much :-)

The workaround is either to get checksum matching roms, or to use a custom query as described here. Look at the examples with -s screenscraper using md5 (or change that to sha1 if you please).

Thank you very much. Is there any way to set the queries per rom in a config file so it can be part of a normal run? I guess I can just create a script which adds individual roms...

muldjord commented 3 years ago

Thank you very much. Is there any way to set the queries per rom in a config file so it can be part of a normal run? I guess I can just create a script which adds individual roms...

You're welcome. Unfortunately not - you will have to do it one at a time.