hugo19941994 / megasd-db-generator

MegaSD DB Scripts
MIT License
27 stars 0 forks source link

Sega CD CRC calculation automation #2

Closed hugo19941994 closed 4 years ago

hugo19941994 commented 4 years ago

The MegaSD CRCs for Sega CD games are generated with only the first 2KB of the first binary file. Redump DATs contain the CRC for the whole binary file.

Any way to automate the CRC generation (without uploading CD images😛)?

bodgit commented 4 years ago

I can write a tool that runs over a redump set of Sega CD images and outputs the MegaSD CRC for each one as I have the code to generate that already. Same for PC Engine & SSDS3 as that has a similar approach (but a different algorithm).

If it spits out something like a JSON lookup table keyed on the redump game name with the CRC as the value, something like:

{
  "Cliffhanger (USA)": "ABC12345",
  "Dragon's Lair (Japan) (En,Ja,Fr,De,It)": "DEF67890",
  ...
}

Would that be useful?

hugo19941994 commented 4 years ago

Thanks for offering your help! But this issue is more regarding the automatic pipeline which executes every day to pick up new ROMs and/or IGDB data (only for MegaSD for now) rather then generating the MegaSD (or SSDS3) hashes.

Github actions are used daily to generate a new, hidden release. This pipeline downloads the newest No-Intro DATs available for the SG-1000, Master System, Genesis and 32X. For Sega CD it uses the DAT file stored in the repo. If I see an increase in the games matched I set the visibility of that release to public.

The code already has a way to download the latest Redump Sega CD DAT automatically, but AFAIK there is no way to generate the MegaSD CRCs without the original bin files. The code to generate the MegaSD CRCs for cue/bin files is also already in the scripts (I've yet to upload the SSDS3 one).

I think I could make the Github Actions cache any generated CRC, download the new Redump DAT each day and if it finds a missing hash try to download the corresponding zip file from somewhere, generate the hash and discard the downloaded file. It's the only solution I have come up with. It should work with the SSDS3 too.

Here's the thing. I'm slowly uploading Saturn screenshots for the MODE to TerraOnion's DB. They have said that the website might be expanded one day to generate the SSDS3 and MegaSD screenshots. So I'm not sure if I should invest much more time making these scripts better...

Apart from the automated Redump updates I think the other two outstanding issues are the missing hacks (which could be downloaded from the SmokeMonster packs) and having separate screenshots for each version/region of a game (I strip everything inside the paranthesis, such as the region, from the ROM names to make matching easier). The region stuff is specially noticeable in the SSDS3 packs.

Sorry of the long winded answer 😅

bodgit commented 4 years ago

Yeah, I was just wondering if it would help to avoid having to download the actual files given they're quite big (and might violate the Github T&C's) and just cheat with a lookup table with the answers.

I've just finished refactoring my Golang code so that the same code now works for both MegaSD and SSDS3 so I'm going to now try your SSDS3 files (and make sure I've not broken the MegaSD support in the process).

I figure the website for supporting the MODE might eventually support the other cartridges although I imagine it will take some time as it will need firmware updates so don't give up just yet :wink:

hugo19941994 commented 4 years ago

Is there a way to generate the MegaSD CRCs without downloading the actual bin files? I guess you mean a tool to generate them offline, but that's essentially what we have stored here. I've used an XML file just so that I can parse the values the same way as the No-Intro DATs (instead of using a JSON file).

I'm still interested to know how you calculate the CRCs though. I wrote the bare minimum to just get the values I needed:

from zipfile import ZipFile
import re
import zlib
from lxml import etree
import os
import glob
from xml.dom import minidom
from io import BytesIO

def cal_crc(fileName):
    print(fileName)
    archive = ZipFile(fileName)
    for a in archive.namelist():
        file_without_ext, ext = os.path.splitext(a)
        if ext == '.cue':
            cueFile = archive.open(a)

            # Get first bin file from first or second cue sheet line
            prevLine = ""
            found = False
            pregap = 0

            for i, line in enumerate(cueFile):
                line = line.decode('unicode_escape')
                if "MODE1/2352" in line:
                    bin_file = prevLine[6:-10]
                    found = True
                if found == True:
                    if "INDEX 01" in line:
                        match = re.match('^INDEX (.*) (.*)$', str.strip(line))
                        offset = match.group(2).split(":")
                        pregap += 2352 * int(offset[2])
                        pregap += 2352 * 75 * int(offset[1])
                        pregap += 2352 * 60 * 75 * int(offset[0])
                        break
                prevLine = line

            if not found:
                return None

            # Read first 2KB of first bin file of CUE sheet
            print(bin_file)

            binFile = archive.open(bin_file)

            # Skip pre-gap if necessary (75 sectors per second, each sector 2352 bytes)
            print(pregap)
            binFile.read(pregap)
            prev = 0
            for i in range(20):
                binFile.read(16)
                b = binFile.read(2048)
                binFile.read(288)
                prev = zlib.crc32(b, prev)
            return "%X"%(prev & 0xFFFFFFFF)
hugo19941994 commented 4 years ago

Closed the issue as the tool can now update the custom DAT file with the MegaSD CRC values from a local folder with zip files or an HTTP endpoint.

Example execution

(megasd-db-generator) workspace/megasd-db-generator [wip-redump-downloader●] » ./generator/main.py --update-custom-dat .
2020-05-17 13:04:49,958 - MegaSD DB Generator

['.']
2020-05-17 13:04:49,959 - Downloading Redump Sega CD DAT
Adding: Silpheed (Japan) (Demo)
Attempting to open ./Silpheed (Japan) (Demo).zip
CRC: A1C71747
Silpheed (Japan) (Demo) has CRC A1C71747
Adding: Night Trap (Japan) (Demo)
Attempting to open ./Night Trap (Japan) (Demo).zip
CRC: 33619C4A
Night Trap (Japan) (Demo) has CRC 33619C4A
Adding: Record of Lodoss War (Japan) (Demo)
Attempting to open ./Record of Lodoss War (Japan) (Demo).zip
CRC: 84C11643
Record of Lodoss War (Japan) (Demo) has CRC 84C11643
Adding: Keiou Yuugekitai (Japan) (Demo)
Attempting to open ./Keiou Yuugekitai (Japan) (Demo).zip
CRC: 1BB70600
Keiou Yuugekitai (Japan) (Demo) has CRC 1BB70600
Adding: Urusei Yatsura - Dear My Friends (Japan) (Demo)
Attempting to open ./Urusei Yatsura - Dear My Friends (Japan) (Demo).zip
CRC: CCA12E1E
Urusei Yatsura - Dear My Friends (Japan) (Demo) has CRC CCA12E1E

The CI pipeline should automatically pick up any new Redump games and generate a PR when necessary with the new CRC values.

Some new homebrew game CRCs were added too. New games which are not in the Redump DB can be added in the EXTRA DAT file

bodgit commented 4 years ago

I'm still interested to know how you calculate the CRCs though. I wrote the bare minimum to just get the values I needed:

Your code looks pretty much the same as what I have BTW, taking into account Python vs Golang although I used someone elses library to do the CUE parsing.