sselph / scraper

A scraper for EmulationStation written in Go using hashing
MIT License
448 stars 88 forks source link

Update hashes for all systems #20

Open sselph opened 9 years ago

sselph commented 9 years ago

Need to update the dataset with all the new games in the GDB.

sselph commented 9 years ago

Finished GG, GB, GBC, GBA

robertybob commented 9 years ago

I found more GG games (24) I somehow missed from the thegamesdb - I've now added them to the website and this spreadsheet has the GameDB IDs for them

https://dl.dropboxusercontent.com/u/48342677/NEW%20GameGear%20IDs.csv

sselph commented 9 years ago

Thanks. I added them all to the dataset and found some duplicate issues in the dataset and corrected those.

robertybob commented 9 years ago

Thanks !!

sselph commented 9 years ago

Updated NES.

sselph commented 9 years ago

finished SNES

robertybob commented 9 years ago

Hi @sselph , a few Genesis games that are currently missing (based on the ROMs I scraped)

https://dl.dropboxusercontent.com/u/48342677/Missing%20genesis.csv

edit

Also, If I can be a little bit picky, my Genesis 'Earthworm Jim (USA)' ROM links to ID 2894 , which is the Mega Drive version (they're identical except for the image), rather than the Genesis ID of 4353.

robertybob commented 9 years ago

My missing Master System ROMs

https://dl.dropboxusercontent.com/u/48342677/Missing%20MasterSystem.csv

sselph commented 9 years ago

Thanks, I've added your entries.

robertybob commented 9 years ago

Hey. These two games result in duplicate entries (ID 611) - one should be ID 611 (vs. Kingpin), one should be 2620 (animated series).

robertybob commented 9 years ago

Hi @sselph , here are all of the GBA games I had that were missing an entry in TGDB. There are now IDs for all of them:

https://db.tt/tEW7B4X8

Can you let me know when your database has been updated? :)

sselph commented 9 years ago

Sorry for the delay, I added your csv.

BlackrosesXI commented 9 years ago

I didn't realize there is a thread for that before I made mine. Support for Nintendo DS would be really appreciated. I have little knowledge in coding, but if it's a matter of time, I have time to spare. Thanks.

Pacolo commented 9 years ago

Hi, I added this game to TGDB: http://thegamesdb.net/search/?string=Super+Soukoban&function=Search but the scraper can't find it.

Of course, the game is in No-Intro set: http://datomatic.no-intro.org/?page=show_record&s=49&n=2865

sselph commented 9 years ago

@Pacolo I added this game.

robertybob commented 9 years ago

@sselph New GBA games to add to your list :) https://drive.google.com/file/d/0ByWwZdQX1FQmVEE2RGJsaUdxbTQ/view?usp=sharing

Could you let me know when they're added so I can re-scrape? Many thanks!

sselph commented 9 years ago

@robertybob Added.

robertybob commented 8 years ago

Hi @sselph , just 5 Turbografx games missing from your scraper

https://www.dropbox.com/s/13lf004v2fiwiv5/Turbografx.csv?dl=0

sselph commented 8 years ago

Done

robertybob commented 8 years ago

Hi again @sselph . Thank you for adding those TG16 games. I've now started importing PS1 games into my Pi. I just scraped 63 games, 35 downloaded images and details entered into the gamelist, yet running your reporting tool gives a figure of just over 50. Not sure what's going on there.

Either way, here's my hashes and TGDB IDs for inclusion onto your database :)

https://www.dropbox.com/s/3pavet9jjc4vyrq/PS1%20Missing.csv?dl=0

Thanks again! :+1:

sselph commented 8 years ago

The script checks the cue then the bins so it is possible the cues were correct since they are just a text file but the bin was slightly different. The reporter tool doesn't look at cue files so it isn't printing those. I can add these bin's in.

robertybob commented 8 years ago

Ah ok, that's where I'm going wrong- it's not finding the games because I haven't got any .cue files set up yet (?)

Also, I've got a few .img files, these aren't supported by your scraper yet are they IIRC?

robertybob commented 8 years ago

So basically without .bin support within the reporting tool, there's no way for me to gain the hashes of the games definitely not being picked up by the Scraper?

adrianmoisey commented 8 years ago

Do you take unlicensed games? 6cf18228cfb66d48b3642069979d4a5103cb8528,26500,7,Somari

robertybob commented 8 years ago

This scraper uses data from thegamesdb.net - if a game is on that site then this scraper should pick it up.

I should note, however, that unlicensed games and hacks are frowned upon on TGDB.net ..if you add that game it may well be deleted.

adrianmoisey commented 8 years ago

This scraper uses data from thegamesdb.net - if a game is on that site then this scraper should pick it up.

The game I added already has an entry on thegamesdb: http://thegamesdb.net/game/26500

It looks like quite a few Unlicensed games already exist in the csv:

~/.sselph-scraper$ grep \(Unl\) hash.csv | wc -l
601

I should note, however, that unlicensed games and hacks are frowned upon on TGDB.net ..if you add that game it may well be deleted.

I didn't add, someone else added it.

sselph commented 8 years ago

I don't care is it is unlicensed but since the system is NES the hash from just the regular shasum probably isn't correct. Do you mind using a version I just created that uses the same hashing algorithms that the scraper uses?

https://storage.googleapis.com/stevenselph.appspot.com/shasum_linux_386.zip https://storage.googleapis.com/stevenselph.appspot.com/shasum_linux_amd64.zip https://storage.googleapis.com/stevenselph.appspot.com/shasum_rpi.zip https://storage.googleapis.com/stevenselph.appspot.com/shasum_rpi2.zip https://storage.googleapis.com/stevenselph.appspot.com/shasum_windows_386.zip https://storage.googleapis.com/stevenselph.appspot.com/shasum_windows_amd64.zip https://storage.googleapis.com/stevenselph.appspot.com/shasum_mac_386.zip https://storage.googleapis.com/stevenselph.appspot.com/shasum_mac_amd64.zip

would be used similar to shasum shasum file.nes

adrianmoisey commented 8 years ago

Do you mind using a version I just created that uses the same hashing algorithms that the scraper uses?

Ah, I didn't realise that you used your own hash. Sorry about that. How's this: bb618e17cd21eaa0185de3a3bf0028dcbba6a0c3,26500,7,Somari (Unl)

sselph commented 8 years ago

Thanks, I've added it. I use a sha1 hash but I parse the header of .nes files to find the rom data.

adrianmoisey commented 8 years ago

Three more:

0c4c9aa8002aece613bf74c2bcee12da79fe5ddc,26490,7,Mortal Kombat II Special (Unl)
9d8366640849c3b22aa0d97770b58572a04ce442,26482,7,Mortal Kombat II (Unl)
232e54da6faf7bac4a29769fa0379570d83ec32e,26491,7,Mortal Kombat III (Unl)
sselph commented 8 years ago

Sorry for the delay, done.

AlexHolly commented 8 years ago

snes http://pastebin.com/dWjyD0NT

robertybob commented 8 years ago

@Alexholly Out of interest, are these games on thegamesdb.net?

AlexHolly commented 8 years ago

Yes the id's are from thegamesdb.net Is there something wrong?

robertybob commented 8 years ago

Hi @sselph :) Got some missing PSX games:

FIFA Football 2003 (E) : TGDB ID = 23566 FIFA Football 2004 (Europe) : TGDB ID = 11705 PaRappa the Rapper - The Hip Hop Hero (E) : TGDB ID = 779

sselph commented 8 years ago

@robertybob @AlexHolly Wow I'm extremely sorry this took so long. I must have have read the email on my phone then forgot to actually log in later to work on this. The hashes have been added.

robertybob commented 8 years ago

@sselph no worries! hope you didn't mind me starting the wiki by the way :)

MetalManTN commented 8 years ago

Here are some missing Lynx games. This is my first contribution, so I apologize if it isn't correct: https://www.dropbox.com/s/h4o7ge2gifmtl10/file?dl=0

sselph commented 8 years ago

Thanks @MetalManTN I prefer it is you do all the work for me :) and open the file up as a csv in your favorite spreadsheet program and add a column for each game and give me thegamesdb ID. I went ahead and looked these up and seems like someone has added them to thegamesdb so they are all there now.

sselph commented 8 years ago

@robertybob I noticed that. Thanks!

MetalManTN commented 8 years ago

@sselph My apologies. I would prefer that I do all the work for you too (I am well aware that you are incredibly busy with all the requests), I just didn't realize I needed to do anything with that output file before posting it. Any and all future contributions from me will be in the correct format with IDs. Thank you for all that you've done for us. I love the scraper.

Jcw87 commented 8 years ago

Got one for NES

f663d004bea0fe0518fb8b2e3a9070e1ef1d39f4,27281,7,Space Invaders (Japan)

Jcw87 commented 8 years ago

This one is already in the list, but it has the wrong game ID

ce7580059e8b41cb4a1e734c9b35ce3774bf777a,9245,22,Combat - Tank-Plus

It should be 4887

sselph commented 8 years ago

@Jcw87 sorry for the delay but I made the changes. Also sorry about the messed up Combat match. I must have had a bad copy/paste or something.

ghost commented 8 years ago

This is my first submission for Hash to TheGamesDB for your scraper. Please let me know if I need to change my CSV in any way to make it work better for you. Here are all the N64 hashes and their associated TGDB ID's:

https://drive.google.com/open?id=1E8anjd2FFlNsRGIAauOhSrKmUaEPj6m-TEoN2JL3hDM

ghost commented 8 years ago

Here is a CSV I made for all NES Hashes which are not currently in the scraper:

https://drive.google.com/open?id=1SaZXgRIdtKK7dRB1zbaQVtsVsSblOvT5E1sHsyBAbx8

ghost commented 8 years ago

Here is a CSV I made for all SNES Hashes which are not currently in the scraper:

https://drive.google.com/open?id=1GBMKFuKTHmhfoJjz5dJtB4o872FmiHoPJsiZz79vLrE

ghost commented 8 years ago

Here is a CSV I made for all Sega Master System Hashes which are not currently in the scraper:

https://drive.google.com/open?id=1RI6igRam-nR1emets0wsDnwqGdiP_rHLY0ynUyscUxo

ghost commented 8 years ago

Here is a CSV I made for all Sega Game Gear Hashes which are not currently in the scraper:

https://drive.google.com/open?id=1zjycsS4WRReu7db7AKUYnYONJP7AabdmNAExVgCBmac

ghost commented 8 years ago

Here is a CSV I made for all Sega Genesis Hashes which are not currently in the scraper:

https://drive.google.com/open?id=1nQbn4QdVX47hMurFUGde7-9rqE6iBG0zmY7b_d2zBQ4