sselph / scraper

A scraper for EmulationStation written in Go using hashing
MIT License
448 stars 88 forks source link

Update hashes for all systems #20

Open sselph opened 9 years ago

sselph commented 9 years ago

Need to update the dataset with all the new games in the GDB.

ghost commented 8 years ago

Here is a CSV I made for all Nintendo Game Boy Hashes which are not currently in the scraper:

https://drive.google.com/open?id=1TE_j4a9TcV1HNrcdvFeOtf7iNrNds71GMgtv_g5PshY

ghost commented 8 years ago

Here is a CSV I made for all Sega 32X Hashes which are not currently in the scraper:

https://drive.google.com/open?id=12NmsgYz1w6b4upG6vbrlztqMylSMKCWx9xA4ny5sMKs

Notes: Can you delete all of the other Sega 32X CD's associated under the system 33 designation? Most of them are linked to the Sega CD TGDB ID and not to the Sega 32X CD TGBD ID. Supreme Warrior and Corpse Killer specifically. Both are linked to Sega CD.

We also should find a way to distinguish Disc 1, Disc 2, Disc 3, etc. for all discs after scraping. It would be nice to be able to tell each disc apart through the front end.

robertybob commented 8 years ago

@stevetb re: Disc titles, maybe this flag is the answer?

-use_filename

If true, use the filename minus the extension as the game title in xml.

ghost commented 8 years ago

@sselph I would like to start pushing a lot of the hashes to you do not exist currently. I have put a lot on here already over this past week but would like to pay back to the community and scrape even more. Please let me know what the fastest method is to get these to you so we can quickly add them to the scraper database. This will immediately help users. I will work on my end to do most of the work for you.

I would also like to have quite a few more platforms added to your scraper. If you can add these platforms I will link the unknown hashes to the TGDB database.

Here are the Platforms of interest:

Please contact me so we can work out a way to add more roms & platforms with me doing as much heavy lifting as possible.

robertybob commented 8 years ago

@stevetb Dreamcast is already supported (?) https://github.com/sselph/scraper/issues/34

ghost commented 8 years ago

@robertybob Sorry I meant, Sega SG-1000. I have corrected above in my comment.

sselph commented 8 years ago

@stevetb Thanks for all the hash data. I'll start taking a look at these. I need to investigate the systems you linked, ie what are the extensions and is there any special encoding, etc.

sselph commented 8 years ago

Added in all your csv files except the 32X. I want to spend some time sorting out the 32X CD from Sega CD and it is getting late.

ghost commented 8 years ago

@sselph I will be out of town for a week. When I return I will start the process of scraping all the roms & variations of roms I can find. Redump will be a guide as well to ensure I capture everything. Thank you again for your help and your excellent scraper!!!!!

ghost commented 8 years ago

Here is a CSV I made for all Sega SG-1000 Hashes which are not currently in the scraper:

https://drive.google.com/open?id=1h2ljwPz7F7_ucF8TQ5KeLeoe3nMHab_Iqj5Q1hLS0j8

Note: Only No-Intro Romset Hashes

ghost commented 8 years ago

Here is a CSV I made for all SNES Hashes from the No-Intro Romset which are not currently in the scraper:

https://drive.google.com/open?id=1_CfJxrSNASApzhXwUQ2QTQBu6KgFS7ohlJCCZngiX9M

Note: I could not find the hashes for the two below games. They are incorrectly matched against TGDB

ghost commented 8 years ago

Updated SNES.csv file with the two missing hashes:

https://drive.google.com/open?id=1bAlg3rX94vdgqWdY3XPh3dMZMQ2sL8ZlADZnb903ono

ghost commented 8 years ago

@sselph

Here is the complete No-Intro Romset for Neo Geo Pocket:

https://drive.google.com/open?id=1PVq4BAkwDM_Qet-ErjOt_oQKJn8_ymFvouwY6Csz7PY

BenWlson commented 8 years ago

Here are some TurboGrafx CD updates. These are based on CUE files for the compressed audio (ogg) sets out there. There are also a rom or two from SNES and Genesis updated in there too.

https://www.dropbox.com/s/a5vpcl9uqx8a62r/HASH%20Updates%205-19-2016.csv?dl=0

BenWlson commented 8 years ago

Here are some more ScummVM games. Mostly Sierra games plus a couple others. The hashes are all identical, but the filenames are there.

https://www.dropbox.com/s/pkreabdakw1q1cs/Missing%20ScummVM.csv?dl=0

BenWlson commented 8 years ago

The US version of the SNES Final Fantasy III improperly grabs the data for the Japanese version (Final Fantasy VI).

MD5: 544311e104805e926083acf29ec664da SHA-1: 057ada1c641e3e0b3ca34e6e4f4eb1b05a87143a final fantasy iii (usa) (rev a).sfc http://thegamesdb.net/game/83/

ghost commented 8 years ago

@sselph Please update the following when you have time. All SNES for No-Intro should be complete once these last 11 are added.

https://drive.google.com/open?id=1bAlg3rX94vdgqWdY3XPh3dMZMQ2sL8ZlADZnb903ono

Note: Star Fox - Super Weekend (USA) and NFL Quarterback Club (USA) are both updates to incorrect TGBD pointers

BenWlson commented 8 years ago

Lot's of missing data for various handheld platforms. Lots of dumb kid games, but it will make things more complete. It also has NeoGeo Pocket Color which I think someone else recently sent data for, but I have the US set in there too.

https://www.dropbox.com/s/0efcbze21afn3a8/handheld.csv?dl=0

sselph commented 8 years ago

@BenWlson @stevetb I think I have added all the information y'all provided.

ghost commented 8 years ago

@sselph so close, hahaha. I still need these two titles updated. Here are the correct hashes and TGDB links which both need updated on your hash.csv

https://drive.google.com/open?id=1hl9CBD1nmbGknm57E9KMGX6YIeqfNPymtFbC_NhMjFU

sselph commented 8 years ago

Something else must be going on Those hashes appear in the csv.

cat hash.csv.gz | zgrep 078c3f6ae65c243fe3e330c699a75df536a5c20a
078c3f6ae65c243fe3e330c699a75df536a5c20a,5707,6,NFL Quarterback Club (USA)

cat hash.csv.gz | zgrep 2ba5f446dcb56d1164e28b337ae7b4833278b6d9
2ba5f446dcb56d1164e28b337ae7b4833278b6d9,26300,6,Star Fox - Super Weekend (USA)
ghost commented 8 years ago

@sselph I cleared out my downloaded images and gamelists and ran it again. This fixed both of the above. Thank you and sorry to pester you on these......especially when it is my error, doh!

Thank you sselph!

BenWlson commented 8 years ago

https://www.dropbox.com/s/1fz78rs78estn2l/More%20Hashes%205-23-16.csv?dl=0

Here's a few that seem to be missed from what I submitted earlier.

sselph commented 8 years ago

@BenWlson Sorry about that. It should be fixed now.

BenWlson commented 8 years ago

https://www.dropbox.com/s/w72wbd0ykcq06gq/atari2600.csv?dl=0

Some Atari 2600 updates.

ghost commented 8 years ago

@BenWlson We need to clean up the rom names for that Atari 2600 set. Basically take out anything in between brackets and remove the file extension. I'd also remove the Extra and Error columns. For the TGDB column, I would remove the http address and instead just put down the ID (Example = 33333). Lastly, you would make another column for Platform so that sselph does not have to look that up and add it in.

sselph commented 8 years ago

Those things are easy enough for me to fix and If the name isn't a no-intro name I just leave that field blank and it picks up the name from thegamesdb.

sselph commented 8 years ago

Forgot to say these have been uploaded.

ghost commented 8 years ago

Just finished the No-Intro set on my end for Sega Genesis and Sega Master System

  1. Sega Genesis is perfect! Good job!
  2. Sega Master System I just need to add one Australian ROM Here is the .CSV

https://drive.google.com/open?id=1GZeScDgMLsijU7mRrFz029cVHDKwHfh6I8t4tBOaOrA

Thank you and great job @sselph !

sselph commented 8 years ago

Done

BenWlson commented 8 years ago

The US version Alisia Dragoon Sega Genesis rom currently pulls the wrong version: Alisia Dragoon (USA).md SHA-1: 15B6244385DB4B449B7C189C13DB7B9C1427C688 should be: http://thegamesdb.net/game/4243

sselph commented 8 years ago

Done

ghost commented 8 years ago

@sselph Two titles (Proto) which I need to complete USA N64 No-Intro Set.

https://drive.google.com/open?id=18F35J2RV1iIZZT4RnwJcDCK7L3UgtD7rjxDs47B_M58

You've done a great job with N64, very complete. Thank you!

sselph commented 8 years ago

Done

BenWlson commented 8 years ago

When I scrape my Genesis USA romset, the majority of the roms end up with US Genesis images, but a small handful come back with MegaDrive images.

Here is the list of the wrong region Genesis roms with hashes and links to the US gamesdb version: https://www.dropbox.com/s/9xikmeu6ar06xvc/Wrong%20Region.csv?dl=0

There are a couple roms that appear to be USA/Europe or World versions, so I guess they could technically go either way. Whether or not to change them would be up to you, @sselph.

BenWlson commented 8 years ago

Here are some updated for Sega CD. They are based on CUE files for a compressed audio (OGG) romset that's out there. I have most of the US romset with the exception of the FMV and shooter games. I may do those later.

https://www.dropbox.com/s/c0voyk8vcaan2h9/segacd.csv?dl=0

ghost commented 8 years ago

@sselph Here are the last No-Intro Roms I needed to complete NES.

https://drive.google.com/open?id=1jYvG6mYSZBIIaqC2Ibv0y4YKaDNGHK3WMeLXn8GCTDU

Thank you again!

sselph commented 8 years ago

@BenWlson Thanks, updated the MD entries. I have chosen to use the US version for USA/Europe since USA accounts for over half of the worldwide sales of genesis/megadrive platforms. Also added the sega CD entries.

@stevetb Added/Updated the NES entries.

ghost commented 8 years ago

@sselph Thank you!

BenWlson commented 8 years ago

007 - The World Is Not Enough (USA, Europe).gbc for GameBoy Color incorrectly pulls from the N64 007.

Here's the correct one: SHA-1: 43552FD2F4464F42D9A5AA7CDF79A012C2BD9DC4 http://thegamesdb.net/game/20734

ghost commented 8 years ago

Here is should be the last titles for the Sega Game Gear for No-Intro.

https://drive.google.com/open?id=1acEedM2Mx_ePLLvaYaefGSnebdfKYjljNVXR_Ubdq_s

sselph commented 8 years ago

@BenWlson I couldn't find that hash in my DB and the 007 for the GBC is already linked to that ID. The only things that link to http://thegamesdb.net/game/238 are the 2 versions of 007 for the N64. Maybe this was fixed recently and the image that was downloaded is stale?

@stevetb Thanks! Added the missing GG.

BenWlson commented 8 years ago

Somehow I gave the wrong hash for the improper GBC 007 game. I just double-checked it, and it's getting an N64 image.

71D84B4065CF2C36B5E337BC2C56D8384418529F http://thegamesdb.net/game/20734

another missing hash: EBF766B37CE893579E76CC9367711DF8479269CD http://thegamesdb.net/game/25769/

sselph commented 8 years ago

@BenWlson I'm not seeing that hash either. It is also upper case so seems to come from an external tool. GBC should be a standard shasum but could you try using my shasum utility https://storage.googleapis.com/stevenselph.appspot.com/shasum_linux_386.zip https://storage.googleapis.com/stevenselph.appspot.com/shasum_linux_amd64.zip https://storage.googleapis.com/stevenselph.appspot.com/shasum_rpi.zip https://storage.googleapis.com/stevenselph.appspot.com/shasum_rpi2.zip https://storage.googleapis.com/stevenselph.appspot.com/shasum_windows_386.zip https://storage.googleapis.com/stevenselph.appspot.com/shasum_windows_amd64.zip https://storage.googleapis.com/stevenselph.appspot.com/shasum_mac_386.zip https://storage.googleapis.com/stevenselph.appspot.com/shasum_mac_amd64.zip

basically use it like shasum filename.gbc

robertybob commented 8 years ago

Game Boy: Probotector (EU) should link to 26550 Probotector 2 (EU) should link to 26552 Currently they pull the info from the US versions (Operation C and Contra: The Alien Wars)

sselph commented 8 years ago

thanks @robertybob. Sorry for the delay but fixed these.

ghost commented 8 years ago

@sselph Tonight I went through the USA and European Romset for GameBoy. I found 10 titles which were not previosly added. Here are all of the missing one's I found:

https://drive.google.com/open?id=1iDPBZoaJfruEIyhF3WXceH6NhiDGoJjiYrWlQDEEfcc

robertybob commented 8 years ago

SNES:

Game Hash GDB ID Apocalypse II.smc - 95640a8ecff7a5380c71fd2a4915f22341870769 - 37466 Congo - The Movie - The Secret of Zinj.smc 32efc8993c8add3c6647fa87712c634b71158787 - 28570 NCAA Football.smc - f8786b52ebbfd72a2fff236a5d1cf09262e7d048 - 5702 Network Q Rally.smc - 94169aaeb1557a25b1e03a181a28b4b29ece6f5f - 37467

paradadf commented 8 years ago

Alpha Beam with Ernie.bin for Atari 2600 pulls incorrect metadata from PS3

Here the correction: Game - Hash - GDB ID Alpha Beam with Ernie.bin - a1f660827ce291f19719a5672f2c5d277d903b03 - 31122

nschloe commented 8 years ago

A bunch of missing entries:

Game,Error,Hash,Extra,thegamesdbID
Bust-A-Move.smc,hash not found,0a34f76c5684bfc6a867476546dad55ddfef5d76,"",2040
Final Fantasy V (Japan).smc,hash not found,a9a77b07cd6c1b98a0186e676c0e3724ba61a94b,"",1762
Final Fantasy VI (Japan) [En by RPGOne v1.2b] [All Bug Fixes].sfc,hash not found,2773801e44947f78e444705aaa9d301e2be6ba36,"",34358
Chrono Trigger (USA) [Hack by Kajar Laboratories Demo 2] (~Chrono Trigger - Crimson Echos).sfc,hash not found,e9a3c2bfa44f864a386ef1dd85cfff909a95181b,"",1255
Secret of Evermore (USA) [Hack by FuSoYa v1.02] (2 Player Edition).sfc,hash not found,69675970540ac9b21a38975b010df5abeba510e3,"",1311
Super Famicom Wars (Japan) (NP).sfc,hash not found,24279ca4b598f4caa0cf4d7fa0a423f9e51bb6f7,hash found but no GDB ID,26347
Seiken Densetsu 3 (Japan) [En by LNF+Neill Corlett+SoM2Freak v1.01].sfc,hash not found,4a8d8bd431959d42e2ba4d953bfd11d042216a34,"",5827