sselph / scraper

A scraper for EmulationStation written in Go using hashing
MIT License
448 stars 88 forks source link

Add Commodore 64 support #35

Open seriema opened 9 years ago

seriema commented 9 years ago

Thanks for a great scraper! I hope it can become part of EmulationStation once it supports the same number of emulators.

Commodore 64 has thousands of games and running the regular scraper is extremely slow so having support for it on your scraper would be a huge gain.

sselph commented 9 years ago

Thanks, I'm glad you like it. I created a C++ port that has been checked in to the unstable branch of ES so it will be there eventually.

There are a little less than 500 games in thegamesdb so even if they all match up I won't have thousands but I can give it a shot. The format of the C64 dat file is different from the other systems I've been working with so I'll need to work on something to parse the files so will take a little longer.

seriema commented 9 years ago

There are a lot of "30 in one" type of games so maybe most of those thousands aren't interesting to scrape (or even keep).

Can I help in any way?

sselph commented 9 years ago

I've been using xml formatted data for other systems but for some reason I can't get that format for C64 so I'll have to carve out some time to parse the ClrMAME Pro file. Once that is done I can create a file to start mapping the rom hash information to thegamesdb information. You could assist in that process but it might be a while until I get to that point.

If you are up to the challenge what I am creating is a csv with the following. sha1 hash, thegamesdb ID, no-intro name

So an example is from the master csv 63627dba22be2b357c0e370e68dc5af56eeb0a24,160,007 - GoldenEye (Europe)

The hashes are from http://datomatic.no-intro.org/?page=download and IDs from http://thegamesdb.net

seriema commented 9 years ago

I'd love to give it a shot!

Just trying to understand what you need. An example for C64 would be: 52523BB41021F208EEFD87C19F7CC0A1, 25528, Transformers Datomatic GamesDB

The hash seems short so it might be the wrong one. Do you already automate this part and you need to hash the files yourselves and then map it to GamesDB? Could we maybe have a chat on IRC or Skype or somewhere?

sselph commented 9 years ago

yeah that hash is the md5 and I've been using sha1. So for transformers I would have: 28a2d07ad046be8ebff691802edbbc8333731b03,25528,Transformers, The (Europe)

I have it partially automated if I can parse the hash file so I went ahead and wrote a parser and created this sheet. It has 2 tabs, No-Intro and TGDB. On the No intro tab I have the hash, original name and a normalized name that hopefully matches something in the TGDB tab.

My process is I go through the TGDB tab and every red line I look to see if I can find the corresponding information in the no intro tab and paste the TGDB name in to column C of all the corresponding rows. The issues are usually things like "the transformers" vs "transformers", "x men" vs "x-men" things like that.

So for this transformer example no-intro has "the terminator", tgdb has "terminator" so I would paste "terminator" in to C4185 and things should turn green.

https://docs.google.com/spreadsheets/d/1AKLrkNURMhqU_CkA1mUnzBHPLG3zEGiAswbzoUJwFe8/edit?usp=sharing

Easiest way to chat would be via google chat, my email address is on my github page.

robertybob commented 8 years ago

Could you share the spreadsheet with me so I can help with this? I get a lot of enjoyment out of adding new games to thegamesdb.net :+1:

sselph commented 8 years ago

This has been delayed because I have been lazy. We had everything for the more raw tape format but there are other formats which I have the hashes for but was looking more in to the formats when I got distracted and haven't gotten back to it. I'll try and make some time to figure out where I left off.

robertybob commented 8 years ago

Ah ok, I can add the games missing from TGDB in that case for when the hashes are sorted :)

omateos commented 7 years ago

Hi, your project is amazing!!!, when I try to scraper c64 games hash appears but nothing happens, How is this subject?, Is my version?

ghost commented 7 years ago

The Commodore 64 set is huge and no one that I recall has taken this on yet. I would really encourage you to take your hashes and match them up to TheGamesDB ID's. Then submit these to SSELPH and he will add them for you, and it will also benefit other users to follow. The romsets everyone is using is No-Intro so make sure you are using that set for your hashes.

Redemp commented 7 years ago

Or we could go one step further and rename the hole gamebase v15 set. Match it to the latest hyperspin database for v15 to rename the romset. Then scan the files.

grugs commented 6 years ago

This works to scrape C64 roms for use with Recalbox

~/go/bin/scraper -console_src ss -console_img mix3,b,s -extra_ext D64,G64,T64,TAP -image_dir downloaded_images -image_path downloaded_images -img_format png -no_thumb=true -max_width=375 -workers=4 -append