superg / redumper

Low level CD dumper utility
GNU General Public License v3.0
184 stars 16 forks source link

Feature Request - DVD mode - Raw dumping support #40

Open ehw opened 1 year ago

ehw commented 1 year ago

So this was expected.

Right now, redumper can do raw dumps of CD based media and with drives that support reading data as audio. DVDs sacrifice transparency for consistency, where they don't expose all the data burned on a DVD to anything but the drive itself. DVDs carry EDC, ECC, and sector header information that's never exposed outside the drive. This is a limitation of drive firmware, so every drive is supposed to be equally incapable of exposing the extra data.

However, it's not impossible to see this data. But as you do research, the methods of accessing this data are all over the place and vary from drive to drive...if the drive can even be tricked into exposing the data in the first place.

Current methods of reading this data typically abuse drive specific diagnostic SCSI MMC commands (like READ BUFFER), so that contents of memory can bleed and be read from the buffer on the drive. Some diagnostic commands are so specific that they can even stream the data raw like a Plextor can with very little tricks (like those GDC LG-Hitachi drives used for Wii/GC dumping). But every drive can have its own reading method. And there are different methods that can be deployed with those commands to make dumping more efficient.

The best example of software that can use these commands is Friidump, open source software specifically for dumping Wii/GC but can be used for raw DVD dumps. I highly recommend looking at the source code and documentation for this as there was some research done in finding drives that work.

There are some advantages to reading raw. The big thing is is EDC/ECC data that I believe is universally on every single DVD. The issue with drives is that since they handle the error correction and detection themselves, you're at the mercy of their capability of reporting when errors actually happen. Some can even sometimes miss the mark and you wouldn't know about it. Having the ECC/EDC data means you can verify it at the user level, which means double ensuring that dumps are actually correct. They never anticipated the ECC/EDC data to be seen or be manipulated by anyone, so the hidden data should be clean and safe to utilize.

It's difficult to explain the intricacies of raw DVD dumping for a lot of reasons. I definitely recommend checking out the source code to Friidump. DIC also implements raw dumping as well.

ehw commented 1 year ago

Some additional documentation in case this gets lost.

https://web.archive.org/web/20080601213824/http://www.kev.nu/360/dvd.html

Certain drives use the Hitachi debug command E7. I'm not sure what exactly it dumps but it has been the quickest method to dump Wii/GC with a PC drive. This is the command Rawdump uses as well. I'm not sure if this is technically 'raw' but this would be a good drive to support.

I've been thinking code-wise how redumper should add support. Since each drive carries its own methods and techniques for dumping, and that the return on those methods differ from drive/vendor, it might be best to categorize drives based on what is returned by the drive itself. That way methods can be separated based on the returns. So for instance, your categories look like this:

1.) Drives that support 2064 scrambled dumping (no ECC) 2.) Drives that support 2384 scrambled dumping (ECC and EDC) 3.) Drives that support raw 2064 unscrambled reads (no ECC) 4.) Drives that support raw 2384 unscrambled reads (ECC and EDC)

RibShark commented 1 year ago

ASUS/LG drives support raw 2384 reads via the same F1 06 command as CDs (alternately 3C 03 02 XX XX XX YY YY YY 00, also works on CDs).

ehw commented 1 year ago

Some things to note since dumping raw with DIC.

1.) Not every drive returns the same data as previously mentioned. However, even drives that return the same data (usually when they support 2384 bytes) can return data that isn't the same from other drives that return the same amount of data. For instance, a Lite-On drive's ability to return 2384 bytes may have some non-crucial bytes be different in comparison the a dump made from an ASUS/LG drive.

2.) Data returned onto the cache or buffer is data that has already been processed by the drive, meaning that the data isn't exactly raw. Raw would imply it's the same, as-is, data that was written on the disc. Each sector contains 2400+ bytes per sector, but so far no method returns all of them.

3.) Because you're reading data only a level below the normal READ12 command, you're still subject to the drive's firmware limitations. This isn't true raw/controllable like Plextor's D8 command or the BE command where you can read the raw data off the CD by reading it as audio. When the drive returns a SCSI error for a sector, it will not return that data at all onto the cache. It might be drive dependent, but from what I've seen no drive will return data that was read as a result from a SCSI error of any kind. Instead, you just get a bunch of zeros that make up the missing sector.

4.) Once possible exception to #3 are the drives that support the proprietary E7 command. E7 works differently, as it's a command specifically used by Hitachi/LG drives to dump the contents of the MN103's RAM onto the cache/buffer, allowing it to be read. It's a debugging command left by mistake. This command in its current usage in different applications only dump the data that's stored in a specific section of RAM where the sectors have already been processed. However, in theory, the drive might have all the DVD's bytes that make up the sector stored in different parts of RAM. So the trick would be to do a complete memory dump of RAM when doing a sector read and figure out the offsets to everything. Then you just dump the data stored at those offsets to create a true raw sector.

5.) Friidump contains a really nice program called BruteForce3C. Unfortunately it's not open source, but it's very simple. Since most drives utilize the read buffer command to create rawish dumps, there might be many more drives that support using the 3C command to read more data off the disc. BruteForce3C can be used to brute force the command and find a command pairing that allows you to discover the command that lets you return non zero data from the first sector off the disc. This can easily be implemented at the redumper level by bruteforcing unknown drives for a parameter configuration with the 3C command to return non zero data for the first sector of a given DVD. You basically just need to look out for the starting PSN (first 4 bytes of LBA 0) which should be 30000. If you return data that has that, then you found a command that returns rawish data. However, then you would have to determine what data is included with each sector and how it's formatted (is there 2064 bytes or 2384? Is it scrambled or unscrambled? etc).

The biggest nut buster for these raw dump reads is the fact that you cannot overcome firmware limitations. So for instance, if a drive really doesn't want to read into an Xbox disc, it's not going to. All it really does at the end of the day is just give you more, verifiable information about the dump itself.

ehw commented 9 months ago

So a few things to note since last time. I've been doing more research and I am currently compiling a list of drives that support various data reads using READ BUFF, and other potential SCSI commands that do more or less the same thing.

1.) So the E7 and F1 opcodes themselves are not memory dumping commands, but are rather vendor specific debugging opcodes that take a pass phrase in the CDB along with other cdb values to determine the functionality of the opcode. The E7 opcode is specific to Hitachi, and has been supported in various forms since their very first DVD drives, but the subcommand that are paired with the E7 commands can vary from drive to drive, so not every E7 subcommand value for memory dumping will persist from drive to drive. Likewise, F1 is Mediatek's debugging opcode. This can vary as well but doesn't vary as much as Hitachi since they haven't been in the game long enough to change things up. They can add and remove subcommand functionality however. From what I've seen however, you don't really need to utilize E7 or F1 at all if you know how to use 3C, since 3C does the same thing. F1 might enable being able to read more memory regions, but you can always read the head of DRAM which is where the raw sector data is stored anyway.

2.) I developed a small script/program that bruteforces values of 3C/E7/F1 to find the command that returns raw data for DVDs. I've been getting submissions from various drives and you can definitely see that there is a pattern in the data that is returned, but there are a number of variations.

http://forum.redump.org/topic/51851/dumping-dvds-raw-an-ongoing-investigation-we-need-your-help/

see table here: https://docs.google.com/spreadsheets/d/1pu3oXHRJ_qlyXrsHUyXOzD5mNp7dU8rgrfVuRBLyQFA/edit?pli=1#gid=0

Raw sector returns on the cache can be classified into types based on the size of the sector on the cache and what can be ultimately gathered from it. Here are the types reserved so far:

So ultimately, the best drives are the ones that return at least 2366 bytes of verifiable data onto the cache. This will include the sync information, EDC value, and PO/PI data used for correction. Brownie points if the drive returns uncorrected data from L-EC related scsi errors onto the cache, as drives like this can be used to theoretically dump Wii/GC discs and discs with (un)intentional errors. Here's a list:

Type 1 - 2064 bytes - Not scrambled, no PO/PI data. EDC and sync only. Type 1a - 2064 bytes - Not scrambled, no PO/PI data. EDC and sync only. Offset by x amount of bytes in cache for some reason. Type 2 - 2236 bytes - Scrambled, EDC, sync, and PO bytes only. No PI for some reason. Type 3 - 2304 bytes - Scrambled, EDC, sync, but no PO/PI bytes it seems. "00 FF FF FF FF FF FF FF FF FF" interweaved every 181 bytes, 11/12 times per group of 2304 sectors. Type 4 - 2384 bytes - Scrambled, EDC, sync, PO/PI data included. Includes 18 extra bytes of discardable, drive specific garbage in every group of 2384 bytes at the end. Type 4a - 2384 bytes - Scrambled, EDC, sync, PO/PI data included. Includes 18 extra bytes of discardable, drive specific garbage in every group of 2384 bytes at the end. Segmented, data accessed with different READ BUFF parameters, observable with only one drive so far (LG GH20LS10). Type 5 - 2392 bytes - Scrambled, EDC, sync, PO/PI data included. Includes 26 bytes added in the PI/PO areas in groups of 2 bytes that seem pointless. Not part of raw sector spec. Type 6 - 2816 bytes - Scrambled, EDC, sync, PO/PI data included. We theorize the extra 450 bytes of data per group of 2816 bytes might be debugging specific data used for the drive but might contain some useful information about the disc at this sector that's not part of raw sector data spec.

Whether or not the drive returns bad sector data uncorrected to the cache is up to the drive firmware itself, some do and some don't. There isn't really a way to determine ahead of time if you'll get data onto the cache if L-EC related errors occur. If the drive returns a group of all 00 bytes on read error with the size depending on the type the drive is associated with, that's your cue that it doesn't.

More research is still ongoing as there are probably more types out there, but the types that were discovered so far don't appear to be associated with a given vendor and seem oddly specific to the firmware itself.

ehw commented 5 months ago

Heh I just realized, maybe it would help my case if I actually mentioned what the benefits of having this data would entail.

1.) Gain disc generated ECC/EDC data. All sectors on every DVD contain error detection and error correction data. A normal user never sees this, so spilling this data over to user-land means that redumper can use the data to verify dumps without really needing to have a 2048 sector size dump be made twice. 2.) Wii/GC support, and possibly more with trap discs/disc swapping. I have a theory - most drives don't like looking at Wii/GC discs because the drives can't descramble and interpret the lead in data where the start and end PSN is held. Those freak LG drives from back in the day are an exception to this because it's presumed that the drives assume most of the lead in data for DVDs since they're 'technically' standard (such as, the start PSN is always 030000 for instance). So if you were to supply a drive with a fake TOC for say, a GameCube sized disc, you can trick the drive with all the information it needs to start reading it with read buffer in any drive by swapping with a trap disc that uses a fake TOC. We actually experimented with this with RVT-R/NR discs, and it works! We didn't try retail, but I'm sure it'll work too. 3.) Refines can be smarter based on per PI row or PO column per sector/block. With C2 you can look within a sector for correction statuses on individual bytes (in redumper's case, samples), but you can't do that kind of correction on something that's already technically 'corrected' and output. With access to the ECC data you could make refines for DVD based media a bit more focused within the data held in a sector. 4.) Sectors have headers which contain 'sync' like bytes to help keep track of where you are on the physical disc. It can give you an indicator of layer breaks and other information about the sector itself. 5.) Incomplete data can be better than no data. Some drives will return uncorrected data onto cache, such as the drives that can 'read' GC/Wii discs. When drives encounter errors, they will never return anything to the user in most circumstances. However, that doesn't mean that all the sectors in a block or bad, or that all the rows that make up a sector are bad. You can descramble a sector with some errors and still get data from it. It's better than all 00, right?

superg commented 5 months ago

Yes I agree with everything, having DVD RAW is good and I'm all in for implementing it. I recall last time we discussed it, it was about reading it from cache? Only the cache part I don't like.

ehw commented 5 months ago

It's not as bad as you think.

It makes sense to worry about reading from cache for CDs as there's a lot to consider for them that makes them a pain in the ass even if they're read with D8. With CDs you deal with tracks, formless sectors, audio sectors, and just a lot of nonstandard nonsense. The biggest boon is that not every sector is required to have EDC/ECC, or even a header for that matter. So you depend a lot on the drive and firmware to ensure that it's being read consistently and that errors are at least reported consistently.

Not so with DVDs, as thankfully someone thought to standardize. Every single sector, from lead in, data zone, lead out, has everything you need - header, error detection, error correction. Even the sync part of the header has its own 16bit CRC to verify the PSN number and the sector ID byte! Regions are also specified by set PSN ranges, so just by the PSN alone can tell you where you are. The EDC evaluates to both the scrambled user data AND the entire header. The header is even covered by two sets of ECC, one that applies within the sector (parity inner) and one that applies to the block (parity outer). Every DVD has these.

So you can be assured that as long as at least the EDC is present, then you know at least the sector you read is good or not. So even if for some freakish reason the sector is corrupt, whether it's due to the drive's RAM, bus, USB, host RAM, HDD, or the program that issued the command, that EDC will tell you if something got messed up.

The only thing to worry about is that not all drives give you everything. From our extensive research, you'll at least have EDC. But ECC is what seems to be an issue, as every drive/vendor seems to do something different when it comes to storing it in memory. The other concern is at what stage the sector data in memory is part of during the drive firmware's data preparation procedures, whether or not its before a certain process that might modify the data, or after. However, from the 200 drives we tested, they all seem to be from after EDC is checked but before ECC is applied in an attempt to correct it.

Hope this helps. Ask Sazpaimon if you want to know more as he played with this data A LOT, lol.

ehw commented 2 months ago

One tiny note that I want to make here so I don't forget.

A common sector size returned onto the cache is 2236 (Type 2). I mentioned previously that drives that return this size to the cache don't return PI channel data. I was wrong, they do, but the PI data appears immediately AFTER all of the 2236 sector/block data that has been read. The data is represented in sequential order by sector and has two/four non zero bytes sandwiched between each PI 'row'. I'm not sure if this data is read from the disc, however, there is documentation that there are more bytes in a given sector that is actually documented by ECMA/MMC, there are patents that mention the existence of "dummy data". This one in particular calls PO "c2" and PI "c1". This person worked for Panasonic.

https://patents.google.com/patent/JP4418431B2/en

But anyway, this means you can actually get the full ECC bytes for every sector/block even in a 'Type 2" drive. You just have to know the offset where they're being stored in DRAM.

ehw commented 1 month ago

Two more side notes: