mamedev / mame

MAME
https://www.mamedev.org/
Other
8.32k stars 2.03k forks source link

[chdman] some Dreamcast CHDs are missing some data after going back to CUE/BIN #11903

Closed alucryd closed 10 months ago

alucryd commented 10 months ago

MAME version

0.261

System information

Arch Linux

INI configuration details

No response

Emulated system/software

No response

Incorrect behaviour

Some Redump dumps of Dreamcast games are missing data after being converted to CHD, then back to CUE/BIN.

For instance, WWF Royal Rumble (USA) is 352800 bytes smaller after going back and forth. Haven't yet tried to see if the BIN is still usable despite the missing piece.

Other games like Ecco the Dolphin - Defender of the Future (USA) (En,Fr,De,Es) are fine. Can't spot any obvious difference between a working game and a non-working one, at least not by looking at the CUE sheet alone.

Expected behaviour

CHDs converted back to CUE/BIN should be a perfect match to the original files.

Steps to reproduce

Additional details

No response

alucryd commented 10 months ago

Looks like the missing data is a padding of 352800 0 at the start of track 2. Hope this helps.

Edit: Confirmed, adding the missing zeroes at the start of track 2 restores the file to be an exact match to the source.

987123879113 commented 10 months ago

I checked it out just to make sure it wasn't a possible more widespread issue but it seems to just be an issue with the GDROM-related code which I don't have any knowledge or experience with.

This stood out to me at a quick glance though. The header for parse_gdicue specifically states it changes the layout of Redump cue/bin to match a TOSEC gdi. This could be what you're seeing. https://github.com/mamedev/mame/blob/7aca06fb13d4e017ee3c16b4f70742cf4311cc86/src/lib/util/cdrom.cpp#L2591-L2592

https://github.com/mamedev/mame/blob/7aca06fb13d4e017ee3c16b4f70742cf4311cc86/src/lib/util/cdrom.cpp#L2917-L2954

aguyfromuranus commented 10 months ago

Don't know if this is related, but I get a similar issue with sa1 and sa2, as I see that the hashes of extracted split bins (using binmerge --split) don't match. Apparantely these same bins are hash perfect when using redump's gdi sheet instead of cue as createcd input, so it clearly has something to do with cue specifically

alucryd commented 10 months ago

Thanks for the quick reply, it's a feature then. That's unfortunate in my case because the reverted files won't verify against the Redump database so I will need to "repair" the files after reversing them. I'll try to reach out to Redump so I can get more input before working on something.

tjanas commented 10 months ago

CHD does not support preservation of ISRC values that may be present in a cuesheet, such as those present in the audio tracks for Ghost Blade.

http://redump.org/disc/70116/

As a result, CHD isn’t truly a lossless preservation of CD-based media. Furthermore, I believe the game itself depends on these ISRC values as a means of copy-protection.

CHD is also lacking in that it may not fully preserve CD tracks that may contain multiple indexes within a single track, and other attributes that may be represented in a cuesheet.

tjanas commented 10 months ago

https://github.com/flyinghead/flycast/issues/906

tjanas commented 10 months ago

https://github.com/orgs/mamedev/discussions/77

rb6502 commented 10 months ago

@tjanas None of those things have anything to do with what is being discussed here.

alucryd commented 10 months ago

I was under the impression that CHD was a lossless format (at least for what is supported in v5), but that deliberate stripping says otherwise indeed.

Redump confirmed they were intentionally keeping the 150 sectors gap for the sake of accuracy, and because they sometimes include actual data, not just zeroes.

It would be nice if chdman could preserve them as well, interested in the rationale behind the TOSEC preference, losing data intentionally sounds counter-intuitive. Maybe the fact that it can contain data wasn't known at the time of writing.

tjanas commented 10 months ago

TOSEC doesn't contain MIL-CD based games in any of its dats as far as I am aware. TOSEC predates Redump and has looser standards than Redump. Also, the scope of Redump is limited to video game optical media, while TOSEC also includes magnetic media, digital-only media, etc.

The issue with CHD is not limited to Dreamcast discs but others as well (such as audio CDs). It also has a challenge with Atari Jaguar CDs (those are multisession discs where the data is mastered as redbook audio tracks). CHD does not preserve the multisession structure or even the DCP flags from the cuesheet.

Example: http://redump.org/disc/74613/ Using chdman from mame0261 with the above redump bin/cue:

REM SESSION 01
FILE "Space Ace (USA) (Track 1).bin" BINARY
  TRACK 01 AUDIO
    FLAGS DCP
    INDEX 01 00:00:00
REM SESSION 02
FILE "Space Ace (USA) (Track 2).bin" BINARY
  TRACK 02 AUDIO
    FLAGS DCP
    INDEX 01 00:00:00
FILE "Space Ace (USA) (Track 3).bin" BINARY
  TRACK 03 AUDIO
    FLAGS DCP
    INDEX 00 00:00:00
    INDEX 01 00:01:74
FILE "Space Ace (USA) (Track 4).bin" BINARY
  TRACK 04 AUDIO
    FLAGS DCP
    INDEX 00 00:00:00
    INDEX 01 00:01:74
FILE "Space Ace (USA) (Track 5).bin" BINARY
  TRACK 05 AUDIO
    FLAGS DCP
    INDEX 00 00:00:00
    INDEX 01 00:01:74

Cuesheet generated from CHD extractcd:

FILE "Space Ace (USA).bin" BINARY
  TRACK 01 AUDIO
    INDEX 01 00:00:00
  TRACK 02 AUDIO
    INDEX 01 00:46:38
  TRACK 03 AUDIO
    INDEX 00 00:52:36
    INDEX 01 00:54:35
  TRACK 04 AUDIO
    INDEX 00 47:42:36
    INDEX 01 47:44:35
  TRACK 05 AUDIO
    INDEX 00 47:50:33
    INDEX 01 47:52:32

For the sake of preservation and accuracy, keeping an original redump-verified bin/cue is recommended vs. only keeping a derived CHD.

To be clear, Redump bin/cue isn't perfect for all game-play purposes; it is simply the best available CD disc image format that is deterministic with reproduceable dumps for datting. Some PC-based discs that use copy-protection methods like SecuROM v4, StarForce v3.x, etc may need mds/mdf dumps with highly-accurate DPM capture, which is beyond the scope of both CHD and redump bin/cue. Also, it is not perfect for CD+G audio CDs, since the CD+G subchannel instructions are not captured by that format (something like CloneCD ccd/img/sub would be more appropriate, but it is currently impossible to have a deterministic sub dump).

rb6502 commented 10 months ago

I will say again louder: as the person who created the optical media support in CHDMAN, it was not intended to be a 100% archival format, just that you can roundtrip the data for common arcade CDs, including Naomi (when not converting to a different format like importing GDI and exporting bin/cue or something).

The plan for CHDv6 was that CHD would be a compressed wrapper around the AARU (formerly DiscImageChef) universal format and that we would immediately gain multisession and a lot of other support. Unfortunately the "libaaru" library that would enable that is (completely understandably) not Claunia's priority so it seems like that's not happening.

So PRs to improve the current situation would be great.

tjanas commented 10 months ago

Not sure if Redumper has library functionality that would be better suited than AARU for optical discs?

https://github.com/superg/redumper

rb6502 commented 10 months ago

I don't immediately see any kind of library API in Redumper, and we are trying to avoid GPLv3.

alucryd commented 10 months ago

Managed to patch chdman so that it recreates matching files, will submit a PR tomorrow. Hopefully it's acceptable even if it's not TOSEC compliant.

alucryd commented 10 months ago

There you go: https://github.com/mamedev/mame/pull/11913

I don't speak C++ but the changes were straightforward. Verified working on a couple Redump CUE/BIN.

TheRealGusBus commented 9 months ago

It seems chdman now throws an error when compressing bin/cues with more than ~3 tracks. Verified on the Redump versions of "4x4 Evo (USA)" and "102 Dalmatians - Puppies to the Rescue (UK)"

TomTurbine commented 8 months ago

Not sure if you should close this one just yet.

Ready 2 Rumble Boxing (USA) (RE) Tee Off (USA)

They do not decompress into the same things that went back in.

987123879113 commented 8 months ago

@TomTurbine Can you elaborate? I just tested using chdman from latest master and everything works as expected. The same data that went in comes back out as can be seen in the SHA-1 sums of the extracted .bin compared to a file made up of the combined .bins of the separate tracks.

Ready 2 Rumble Boxing (USA) (RE):

>./chdman createcd -i Ready\ 2\ Rumble\ Boxing\ \(USA\)\ \(RE\).cue -o ready2rumble_usa_re.chd
chdman - MAME Compressed Hunks of Data (CHD) manager 0.263 (mame0252-3947-gc2c61bf29c3)
Output CHD:   ready2rumble_usa_re.chd
Input file:   Ready 2 Rumble Boxing (USA) (RE).cue
Input tracks: 7
Input length: 122:02:00
Compression:  cdlz (CD LZMA), cdzl (CD Deflate), cdfl (CD FLAC)
Logical size: 1,344,343,680
Compression complete ... final ratio = 28.0%

>./chdman extractcd -i ready2rumble_usa_re.chd -o ready2rumble_usa_re.cue
chdman - MAME Compressed Hunks of Data (CHD) manager 0.263 (mame0252-3947-gc2c61bf29c3)
Output TOC:   ready2rumble_usa_re.cue
Output Data:  ready2rumble_usa_re.bin
Input CHD:    ready2rumble_usa_re.chd
Warning: extracting GD-ROM CHDs as bin/cue is not fully supported and will result in an unusable CD-ROM cue file.
Extraction complete

>sha1sum -b Ready\ 2\ Rumble\ Boxing\ \(USA\)\ \(RE\).cue ready2rumble_usa_re_source.bin ready2rumble_usa_re.bin
4bcb43cb73c46077a0fb9a410cd38a49590b2ccb *Ready 2 Rumble Boxing (USA) (RE).cue
40b92e81e906c9e6f382fa6d3471eb2fecc480f2 *ready2rumble_usa_re_source.bin
40b92e81e906c9e6f382fa6d3471eb2fecc480f2 *ready2rumble_usa_re.bin

Tee Off (USA):

>./chdman createcd -i Tee\ Off\ \(USA\).cue -o teeoff.chd
chdman - MAME Compressed Hunks of Data (CHD) manager 0.263 (mame0252-3947-gc2c61bf29c3)
Output CHD:   teeoff.chd
Input file:   Tee Off (USA).cue
Input tracks: 3
Input length: 122:02:00
Compression:  cdlz (CD LZMA), cdzl (CD Deflate), cdfl (CD FLAC)
Logical size: 1,344,333,888
Compression complete ... final ratio = 27.2%

>./chdman extractcd -i teeoff.chd -o teeoff.cue
chdman - MAME Compressed Hunks of Data (CHD) manager 0.263 (mame0252-3947-gc2c61bf29c3)
Output TOC:   teeoff.cue
Output Data:  teeoff.bin
Input CHD:    teeoff.chd
Warning: extracting GD-ROM CHDs as bin/cue is not fully supported and will result in an unusable CD-ROM cue file.
Extraction complete

>sha1sum -b Tee\ Off\ \(USA\).cue teeoff_source.bin teeoff.bin
a86d22eb15a1157770ea310329d7dcc0ea40a7d4 *Tee Off (USA).cue
410f583bf39d89a6be0eab59f9dbc3d07f6c1e66 *teeoff_source.bin
410f583bf39d89a6be0eab59f9dbc3d07f6c1e66 *teeoff.bin
TomTurbine commented 8 months ago

Not sure what to say, can try again later maybe I just know I compressed them from the ReDump verified version to CHD. I tried using the GDI files, then I tried to verify with NKit and it would fail for those 2 games, so I extracted back to cue file and that failed too. Used the GDI files because I know CHDMan was giving issues using the straight cue sheet not too long ago.

Don't know the technical details, just know I tried that. Sorry I can't be of much more help.

Looking at your log, you used the cue sheet, I take it the issues with CHDMan and Dreamcast using the cue sheets instead of GDI files have been resolved?

987123879113 commented 8 months ago

Extracting Dreamcast games back out to .bin/.cue is broken still (chdman gives a warning since it doesn't generate a valid Dreamcast cue, so it has no chance of verifying at all), but chdman no longer discarding data from the Redump input when creating the CHDs so it's possible to restore it in the future.

I covered some of this in my comment on my PR: https://github.com/mamedev/mame/pull/12087#issuecomment-1975098822

If extracting the Dreamcast CHDs back into a format that can be verified against Redump is important to you then I don't recommend using CHDs to store your Dreamcast games for now until Dreamcast .cue exporting is properly implemented. If you create a CHD using Redump .gdi and then extract back into .gdi then you should get fully matched data (the .gdi will have different formatting, but it should have all of the same information as the original Redump .gdi). I wouldn't recommend using Redump .gdis to create CHDs though because it's impossible to tell them apart from TOSEC .gdis and so the data can't be rearranged internally into the format MAME/chdman expects for GD-ROMs, and I wouldn't be surprised if Redump .gdi CHDs don't work in emulators.

tjanas commented 8 months ago

Redump uses bin/cue. TOSEC uses gdi.

angelosa commented 8 months ago

Gotta love attempts at verifying software while not even populating SW list as per https://github.com/mamedev/mame/issues/12154 . If MAME driver cannot boot a raw .bin/.cue (*) then I'm mildly curious to check why assuming there's a "known working on another emulator", if not then I'm not even sure why MAME should care at all.

(*) which I'm not sure why you should, it has been intended to be used with .gdi specifically for DC, and converting a .cue to .gdi doesn't require rocket science.