sselph / scraper

A scraper for EmulationStation written in Go using hashing
MIT License
449 stars 88 forks source link

Getting .hack//link in all GBA games. #214

Closed DelScipio closed 6 years ago

DelScipio commented 6 years ago

Many people are reporting the same issue with different consoles:

https://retropie.org.uk/forum/topic/16059/sselph-scrapper-getting-strange-results

https://www.reddit.com/r/RetroPie/comments/7tl7cd/sselph_scraper_help_strange_results/

Any way to fix it?

cosmo0 commented 6 years ago

Same here, not just for GBA games, but for seemingly random games in most systems.

The scraper has become kind of unusable at that point, you have to check each gamelist after scraping...

I assume it's a problem in the screenscraper source. ".hack//LINK " is probably the first entry in the whole database and it picks that by default.

ghost commented 6 years ago

Same for some N64 games.

jamcrackers commented 6 years ago

same here for a lot of PSX games

BrainSlugs83 commented 6 years ago

Running on latest (as of 2/7/18), I've got like a couple hundred NES and SNES games that all get marked as the PSP game ".hack//LINK" as well.

Like, not only is it matching the wrong system, but the file name isn't even close to that name (for example, "Lethal Weapon.zip" gets scraped like this).

It would be nice if the scraper could at least verify the data it's putting in is for the correct console. :)

And if the scraper can't find a match, to just leave that ROM alone, and don't scrape for it?

I'm running version 4.3 of Retropie, with the the scraper built from sources last night (2/7/18) -- with the following options: Thumbnails Only: Disabled Arcade Source: ArcadeItalia Console Source: ScreenScraper ROM Names: theGamesDB Gamelist: Overwrite Use rom folder: Enabled Download Vidoes: Enabled Download Marquees: Disabled Max: 400x400

BrainSlugs83 commented 6 years ago

Looking at the XML that gets generated -- I wonder if this may be related to the "Use rom folder" option being enabled, check this out:

<game id="65505" source="screenscraper.fr">
    <path>./10 Yard Fight.zip</path>
    <name>.hack-Link</name>
    <desc>The first game in the .hack series for PSP (and the planned final game for the franchise), .hack//LINK logs player into a new version of its virtual landscape called The World R:X (the &amp;quot;R&amp;quot; stands for &amp;quot;Revision&amp;quot;)...[/truncated]</desc>
    <image>./images/10 Yard Fight-image.jpg</image>
    <rating>0.85</rating>
    <releasedate>20100304T000000</releasedate>
    <developer>Bandai Namco</developer>
    <publisher>CyberConnect2</publisher>
    <genre>Role playing games</genre>
</game>

Notice that the name starts with "./" (because, it's in the same folder as gamelist.xml) -- I wonder if it's passing that into the search engine by accident? -- I'm going to try with that option disabled when I get home.

EDIT: Of course that didn't work. sigh -- It would have been too easy. >.<

cosmo0 commented 6 years ago

For my part I wonder if it's not rather a bug in the screenscraper API.

I would file a bug there, but I'd rather be sure it's their API and not this scraper only.

Can someone try to scrap the same failing files with Universal XML Scraper and/or Skrape ? (I'm on Mac so I can't use them).

BrainSlugs83 commented 6 years ago

I'm going to try a different source next. -- If that doesn't work, I'll give one of those tools a try.

BrainSlugs83 commented 6 years ago

Changing sources worked, but no video feeds, and it leaves a lot to be desired. -- I'll try one of those tools today.

So I guess it's partially a problem with the ScreenScraper source? -- Not sure on who's end though. However, the tool itself should at least be validating that the game it's scraping is for the correct system... (Ever try to plug a PSP game into an NES? Doesn't work that well in practice.)

cosmo0 commented 6 years ago

Thanks! I'll file a bug at Screenscraper.

BrainSlugs83 commented 6 years ago

Looks like UXS works and doesn't have this issue (even when using Screenscraper) -- I think it's a problem with this scraper when using that source.

Universal-Rom-Tools commented 6 years ago

Hi, I can't test right now, but if someone can test UXS in "filename" search (not "CRC+Filename") Maybe the API can return something wrong with a bad filename... (so it must do the same on SSelph's scraper and UXS)

But I think nothing change on the API from a "long" time ;) (the only change can be on the New API V2, and not sure, but I think nothing change on it from several month too ;) )

sselph commented 6 years ago

I haven't had a chance to test this but I search only based on the hash and I don't think I even provide the name and expect the API to tell me that it doesn't exist. So either there has been a logic changed or this has always been an issue and someone finally submitted a game that triggered it. From the description it seems like everything i just getting the first available game. When I get some time I'll see if I can just treat this response as a Not found or also go through the returned list of ROMs and double check that the hash I asked for is actually in the response.

Was the v2 API ever released? I wrote code for it a long time ago but never hear that it was released. If so, that may help since it returned better error codes and was more efficient in general.

On Mon, Feb 12, 2018 at 9:20 AM, Universal ROM Tools < notifications@github.com> wrote:

Hi, I can't test right now, but if someone can test UXS in "filename" search (not "CRC+Filename") Maybe the API can return something wrong with a bad filename... (so it must do the same on SSelph's scraper and UXS)

But I think nothing change on the API from a "long" time ;) (the only change can be on the New API V2, and not sure, but I think nothing change on it from several month too ;) )

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sselph/scraper/issues/214#issuecomment-364935746, or mute the thread https://github.com/notifications/unsubscribe-auth/AHwNVv8-X5ILrllWyZDDlLdIWHO59kXyks5tUEiqgaJpZM4RwXDv .

cosmo0 commented 6 years ago

From what I gather on the forum, it looks like the V2 has indeed been released, but it's not very clear.

They're investigating on their side too: https://screenscraper.fr/forumsujet.php?frub=12&fsuj=550

Universal-Rom-Tools commented 6 years ago

Hi SSelph ;) The V2 isn't "officialy" released ;) but some already use it as it already pretty stable and won't move so much ;) (lately a light correction on the Json Version but no movement on it since a while ;) )

I had try some test on the API (V2 and V1) with the game "Lethal Weapon.zip" and all is ok on the API return...

I also check the game ".hack//link" on PSP... It seam this game haven't SHA1 referenced on ScreenScraper... Maybe when a SHA1 isn't found it return this game (so No SHA1 found taking the first game with No SHA1) ?

(And Yes @cosmo0 ;) its me ;) )

cosmo0 commented 6 years ago

I also check the game ".hack//link" on PSP... It seam this game haven't SHA1 referenced on ScreenScraper... Maybe when a SHA1 isn't found it return this game (so No SHA1 found taking the first game with No SHA1) ?

It's possible but it's far from the only game without SHA1 hash, so that would be surprising.

sselph commented 6 years ago

That seems like a likely cause. It may also be a bug with my script maybe for some reason I'm not sending a sha1 value? but in either case I'll see about doing a better job handling this situation. I like the idea of double checking the response has the rom I asked for. I think I do this already to get the correct region information so if the expected rom doesn't exist in the response then I should throw the result out.

On Mon, Feb 12, 2018 at 10:16 AM, Universal ROM Tools < notifications@github.com> wrote:

Hi SSelph ;) The V2 isn't "officialy" released ;) but some already use it as it already pretty stable and won't move so much ;) (lately a light correction on the Json Version but no movement on it since a while ;) )

I had try some test on the API (V2 and V1) with the game "Lethal Weapon.zip" and all is ok on the API return...

I also check the game ".hack//link" on PSP... It seam this game haven't SHA1 referenced on ScreenScraper... Maybe when a SHA1 isn't found it return this game (so No SHA1 found taking the first game with No SHA1) ?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/sselph/scraper/issues/214#issuecomment-364953093, or mute the thread https://github.com/notifications/unsubscribe-auth/AHwNVlE6oNQfjkWKPw67yv8Qy8MDpifiks5tUFXogaJpZM4RwXDv .

stoz commented 6 years ago

It's possible but it's far from the only game without SHA1 hash, so that would be surprising.

.hack//link is however probably the first (or last, depending on how the sorting works) game without an SHA1 hash in alphabetical order.

sselph commented 6 years ago

Sorry for taking so long to take a look. I added an update to double check the results from ScreenScraper to make sure they are sane. In testing I also noticed that an empty file will result in a match from SS so I'm filtering that client side so that we don't match it.

cosmo0 commented 6 years ago

Thanks! It seems to have fixed the issue, I ran a scrap again, and I have not gotten any ".hack//Link" entry.

qphoria commented 5 years ago

This is fixed in 1.4.6 for the Name and description. .. but it still downloads a generic game image that has box art for "hack//link" as the gameart. Seems like someone's practical joke but it actually uses this generic image for all games that don't have entries if you are using the "-add_not_found" switch.. so that's still not desirable. Best if it could just use a "No image available" generic image or something for not founds.