sselph / scraper

A scraper for EmulationStation written in Go using hashing
MIT License
449 stars 88 forks source link

Sselph Scraper won't scrape PSX games #205

Open brettoblaster opened 6 years ago

brettoblaster commented 6 years ago

It seems to go through the process but when I go to browse through the names, there's no box art, videos, or marquee. Every other system has been fully scraped properly, but this one is giving me trouble and keeps saying "hash not found." Any ideas? Using a Pi3.

sselph commented 6 years ago

This system isn't very complete. If you happen to have bin/cue that match the source I used then it will work. If you have .bin or something else then it won't work. This may be related #210. Currently I only match games in my .csv

sahaq commented 6 years ago

I wouldn't say PSX is gaining momentum on PBP format at all. I honestly see CHD overtaking, as it's not a lossy compression method, and saves more space overall.

sahaq commented 6 years ago

Current builds of mednafen-beetlepsx support CHD, load times are great, total file size gain is great. Only issue is manually creating playlists, which can be done, but mildly annoying.

tcamargo commented 6 years ago

Retroarch is also supporting chd. At least for PSX and Sega CD. I can add -extra_ext "chd" but it only works with OpenVGDB. If possible, I prefer to use screenscraper for 3d box, video, and whell.

tcamargo commented 6 years ago

Just understood how SS works. I am checking if it is possible to add chd file as rom dumps in SS. Wondering if would be possible to reconstruct the cue file from chd to have a good hash...

dimmthewitted commented 6 years ago

Stop me if I'm wrong but Steven's scraper doesn't work on matching most PBP roms because the database (hash.csv?), only contains matches on hashes. Does it also search on names or is it just hashes?

I like PBP because I can easily create them to combine discs into a single manageable file. I also enjoy the compression this yields.

I do agree with Icelander, that if the search excluded regional tags in the name or anything in brackets name search would work better.

I don't know GO, Steven, but if the utility used TSQL, I would UPPER and and only pull out the PATINDEX('%[^a-zA-Z0-9]%' or reg expression for just the alphanumeric characters.

Here is what I propose. I can make MD5 hashes of all the Playstation BIN files that I have for inclusion in the hash database. If this would be helpful and possibly provide additional matching support.

sselph commented 6 years ago

As you mention I just parse the cue/bin files and hash them to compute a hash and match those against data I downloaded from redump. I don't handle chd or any other combined format. The reason is I haven't attempted to understand the format enough to know if I can take the chd data and get the hash data redump provides. I don't own any psp or psx games so can't attest to any hashes being correct.

If you have a definite hash to games mapping or some algorithm to extract the bin data from the chd I can probably add it no problem.

dimmthewitted commented 6 years ago

Used an MD5 checker for a hash list of PBP files..

Used RHash to output a folder of PBP's into an output file: * rhash -M .PBP > GamesHash.txt **

PBP hash list.txt

I'll see if I can get a tested hash.csv where matching works for re-inclusion later.

dimmthewitted commented 6 years ago

hash.csv didn't pick up my hashes.

Check the attached hash.csv addition, I tried it slipped in 59721 and goesrom list ID 5000 to 5090

Found most of the gamelist ID's with excel vlookup from higher up in the list, but there are some blanks.

I see I need to trim my column of a trailing space, but more importantly my hashes are 32 characters and yours are 41 characters.

Steven, have you used RHash to generate the hashes? Does the scraper use MD5 or SHA?

Just the additions: hash test 1.csv.txt

sselph commented 6 years ago

I use a sha1 hash that is the reason for the difference. Sorry I missed the mention of md5 in the original message.

I haven't used rhash but looking at the documentation for that program it looks like you need to do rhash -H -M *.PBP > GamesHash.txt

dimmthewitted commented 6 years ago

rhash -H *.PBP > GamesHash.txt Got 40 character SHA hashes. Same as the hash.csv.

added to the hash.csv (renamed to post the file here) hash.csv.txt

Copied the updated hash.csv to PSX rom directory. Manually ran the scraper fom the PSX rom directory.

/RetroPie/roms/psx $ sudo /opt/retropie/supplementary/scraper/scraper -hash_file hash.csv

2018/04/17 19:54:30 arcade srcs: 2018/04/17 19:54:30 console srcs: gdb

hash.csv is deleted. What am I missing to manually run the scraper correctly?

sselph commented 6 years ago

By default it only scans files with extensions it recognizes but you can pass -extra_ext=".pbp" to tell it to hash other extensions with a basic sha1.

dimmthewitted commented 6 years ago

So rhash -H for SHA1 hashes works. I hashed a BIN and it came up the same hash found in the hash.csv.

When I try to run with the pbp extension, I get no output still using my updated hash.csv.

/RetroPie/roms/psx $ sudo /opt/retropie/supplementary/scraper/scraper -extra_ext=".pbp" -hash_file hash.csv

2018/04/17 19:54:30 arcade srcs: 2018/04/17 19:54:30 console srcs: gdb

/RetroPie/roms/psx $

(no INFO or other output, it just skips past my csv additions) (hash.csv not being deleted. Previously report must have been because I did a CTRL-Break)

Is there any other scraper argument I can run to get additional logging? I will look at this more closely again later.

dimmthewitted commented 6 years ago

Nevermind, it is working fine how. Where do you want me to post the hash.csv file. It would be nice if PBP files could be added to the known extensions.

I will continue to add the hash.csv over the next couple months and grow it beyond the initial hundred I added.

symbios24 commented 6 years ago

Any PBP File support?

RobotLimeLtd commented 6 years ago

It works for me, but only for simple pairs of 1 .cue and 1 .bin; it isn't working for multi-track (i.e. multiple .bin) games. Is there a workaround for this?

symbios24 commented 6 years ago

maybe next lifetime

sselph commented 6 years ago

Sorry @RobotLimeLtd I missed your message. It is supposed to work for multitrack bin/cue files. It reads the cue file which is basically a text file with a list of bin files and it uses that to get a list of files to hash and if the cue or any bin matches I take that. I suspect I could do better if I adjusted for case but it is also possible someone has renamed the files. Like they were originally game-1.bin and they adjusted them to make them prettier like game (track 1).bin so I may want to look for similar base names too.

I created #100 for this but as you've seen I haven't been super active. I've been making some time to start refactoring things. Hopefully that will get me going again.

jLynx commented 1 year ago

Any update around being able to scrape for .CHD files?