libretro / RetroArch

Cross-platform, sophisticated frontend for the libretro API. Licensed GPLv3.
http://www.libretro.com
GNU General Public License v3.0
10.39k stars 1.84k forks source link

(Database) NES Header Skip #7289

Open RobLoach opened 6 years ago

RobLoach commented 6 years ago

The No-Intro NES DAT CRC files assume the files don't have headers. This makes the database scanner miss a lot of the NES files during scanning if your NES files have the headers in place.

@rzumer and @leiradel , do you know if it would be possible to catch .nes files, and get the CRC of the file without the header information so that No-Intro NES would match?

They have their header information in this XML file: http://datomatic.no-intro.org/stuff/header_nes.zip

i30817 commented 6 years ago

This should be possible, i'm pretty much forced to do the same thing in rhdnet to calculate md5sums to match:

https://github.com/i30817/rhdndat/blob/master/rhdndat/__main__.py#L563

Though my 'detection' actually isn't automatic but controlled by user input passing the dat with the right property, so it's useless for your purposes.

But aren't you 'always' supposed to skip 16 bytes for all NES-no-intro? That's certainly how i'm using it and i haven't seen mismatches (or went looking for them though). It might just be as simple as calculating the .nes files crc by always skipping the first 16 bytes.

If this doesn't happen already i'm a bit perplexed how the scanner even scanned my no-intro collection...

I had just assumed that you had the version of no-intro dat that includes the header and calculated the whole file, since that is much less aberrant (and works better for hacks whose crc is the whole file).

In short, i think you shouldn't do this, but find a version of the dat file that has the whole roms checksums. I know they exist, and if they don't you can easily create them because the whole NES no-intro is about 500mb. This is better to have no special cases and for hacks as was said.

I think the only real reason that no-intro doesn't remove all the damn (16bytes) headers in a single final release is the hacks and because some emulators need them.

leiradel commented 6 years ago

It should be possible, NES headers start with {'N', 'E', 'S', 0x1a} and take 16 bytes, so it should be easy to detect and skip them during the scan.

i30817 commented 6 years ago

The point i'm making is that this would make all hack hashes wrong, which probably means you shouldn't do this because hack hashes are harder to regenerate (that my PRs for nes and snes hacks that are auto-generated get rejected certainly doesn't help hint-hint - i could modify rhdndat to output the hashcodes without the header for the nes ).

leiradel commented 6 years ago

I'm not judging if it's appropriate or not to hash the ROM without the header, just stating that it's possible and easy to implement should it be necessary.

i30817 commented 6 years ago

Ok, i basically have no complaints if i'm allowed to merge a PR in libretro-database to replace the current (very old and incomplete) database of nes hacks by a version that skips the header so new updated hacks keep working. I'd say 'replace the snes too' while i'm at it, but i can easily separate that.

Since i don't keep old version of the hacks around, or even download all of them, just a majority (it would explode the hack numbers and i only check hacks in romhacking.net) some hardpatched hacks might be 'missing', but this shouldn't matter for users if the playlist is already created iirc.

But let's wait to see what @RobLoach says about this, there might be a dealbreaker for this idea with the database or metadata connection between hacks and the dat.

RobLoach commented 6 years ago

this would make all hack hashes wrong

We could likely only check headerless CRCs as a fallback.

If this doesn't happen already i'm a bit perplexed how the scanner even scanned my no-intro collection...

We stuck some headered ones in here: https://github.com/libretro/libretro-database/blob/master/dat/Nintendo%20-%20Nintendo%20Entertainment%20System.dat

Those aren't guaranteed to match all collections.

i30817 commented 6 years ago

BTW this isn't directly relevant since it was already said the nes header can be checked directly, but i suspect some of the nes roms in no-intro already don't have a header or never had. Some of the prototypes. So the check needs to always be done to the file, not 'always skips 16 bytes' like i'm doing in rhdndat (need to fix that).

leiradel commented 6 years ago

Shouldn't the dat file contain duplicated entries for each ROM, one for the hash with the header, and another without?

RobLoach commented 6 years ago

Shouldn't the dat file contain duplicated entries for each ROM, one for the hash with the header, and another without?

It currently does, yes. Would save us some entries (and possible outdated names) to not have to duplicate them, however.