jivanpal / drat

Utility for performing data recovery and analysis of APFS partitions/containers.
GNU General Public License v3.0
163 stars 21 forks source link

Help wanted for an is_cksum_valid failure case #32

Closed akira-okumura closed 2 years ago

akira-okumura commented 2 years ago

First of all, thank you very much for developing this helpful tool. This is only the solution from the open-source side for APFS disk failures.

Due to a sudden power cut off at my home, one of my APFS external HDDs was broken. It is not a physical failure but it looks that something in the partition information got corrupted.

After posting a question on AskDifferent https://apple.stackexchange.com/questions/437386/apfs-partition-cannot-be-read I found your software, which is able to recover files on my HDD. I have not tried recovering the whole ~5 TB data, but at least on of movie files was successfully recovered by the recover command.

The detail of my disk failure is described at the link above. In addition to the above information, I tried the inspect command and I got the following error.

$ sudo ./drat inspect --container /dev/disk2s2
(snip)
--------------------------------------------------------------------------------
Ephemeral OID:                      0xc9b3c
Logical block address on disk:      0x1aad
Object type:                        B-tree (non-root) node
Object subtype:                     Space manager free-space queue
Object size:                        4096 bytes
Associated volume OID (virtual):    0
--------------------------------------------------------------------------------
- There are 27 checkpoint-mappings in this checkpoint.

Reading the Ephemeral objects used by this checkpoint ... OK.
Validating the Ephemeral objects ... FAILED.
An Ephemeral object used by this checkpoint is malformed. Going back to look at the previous checkpoint instead.
END: Handling of this case has not yet been implemented.

It looks that this message is emitted in line 271 of src/commands/inspect.c and so the recorded checksum in my disk got corrupted due to the aforementioned power failure. The disk has 27 checkpoints and the above message is shown at the 27th one.

I commented out return 0; at line 278 in the same file to see if any other problems are reported. However, very fortunately, I did not see any other obvious issues in the output.

$ sudo ./drat inspect --container /dev/disk2s2 | grep OK
Opening `/dev/disk2s2` in read-only mode ... OK.
Reading block 0 ... validating ... OK.
Loading the checkpoint descriptor area into memory ... OK.
Loading the corresponding checkpoint ... OK.
Reading the Ephemeral objects used by this checkpoint ... OK.
OK.
Loading the container object map ... OK.
Validating the container object map ... OK.
Reading the root node of the container object map B-tree ... OK.
Validating the root node of the container object map B-tree ... OK.
Reading the APFS volume superblocks ... OK.
Validating the APFS volume superblocks ... OK.
Reading the volume object map ... OK.
Validating the volume object map ... OK.
Reading the root node of the volume object map B-tree ... OK.
Validating the root node of the volume object map B-tree ... OK.
Reading ... validating ... OK.

If my disk failure is simply caused by the broken checksum, I would like to somehow fix the checksum. Is there any easy way to do it by Drat or any other tool? I would like to mount the disk again to easily save my files (but I have backup of >99% files).

jivanpal commented 2 years ago

Hello, the current inspect behaviour is undesirable in that it stops inspecting the drive after discovering an invalid checksum, rather than continuing inspection. This will be changed in the next version, similarly to how you have worked around this already. The state of the Ephemeral objects is not important for data recovery, but it is for mounting the filesystem. I don't expect you will be able to easily repair the filesystem in order to mount it (Drat does not have such functionality yet), so just recovering any discoverable data from it is the way to go.

We also do not currently inspect the B-trees, such as that of the omap (object map), but as the output of fsck_apfs you provided in your Ask Different / StackExchange post shows, a corrupt omap (and likely other corrupt data structures that fsck_apfs isn't telling you about because it also halts as soon as it discovers any error) is the problem. I advise you use drat list to see if Drat can discover your files, and drat recover to see if you can then recover them. Even if drat list does not reveal them, they may still be discoverable by exploring past transactions or the entire drive (as Disk Drill does), but Drat does not yet have the functionality to do this.

Best of luck!

akira-okumura commented 2 years ago

Thank you very much. How does drat handle hard links in the recover command? My photo libraries (Aperture and Photos.app) use lots of hard links inside. I guess it simply recovers hard linked files one by one without caring that they are hard links and resulting in duplicating the files.

jivanpal commented 2 years ago

Hard links are just dentries that point to an existing inode, so they are handled identically to regular files, as for all intents and purposes they are regular files. As such, without greater context (namely having a mapping from inode numbers to all paths that point to that inode) there is no convenient way to tell whether a given path is a hardlink that was created after the original file was created, or whether it is the original path. In particular, the user could provide a set of paths that are suspected to point to the same inode, this suspicion can be verified or refuted, and the file data recovered accordingly, but this isn't usually a useful feature to have.

The best we could do currently, without that greater context (which indexing the filesystem will permit, and is on the roadmap), is to say "We can see that this path points to an inode that has multiple paths, but we don't know if this is the original path, nor if you have already recovered the file data using one of the other paths. Would you still like to recover the data?" If this would be useful to you, it would be simple for me to add.

akira-okumura commented 2 years ago

If you could easily implement the feature, it would be very helpful for Aperture and Photos users (like me) who migrated Aperture libraries to Photos. Because the original Aperture libraries are not deleted after migrating to Photos, but the library contents (RAW and JPEG files) are kept in both libraries as hard link files to avoid doubling the library size.

But answering Yes/No questions 100,000 times (i.e., number of photos) is not very realistic...

jivanpal commented 2 years ago

But answering Yes/No questions 100,000 times (i.e., number of photos) is not very realistic.

Precisely my point. Probably better to wait until we can index the whole filesystem and thus handle hardlinks in a smarter way.

Would an "always answer yes" flag be useful to you, rather than a prompt?

akira-okumura commented 2 years ago

Yes, "always answer yes" will work in my case. All the hard links I have should have Aperture–Photos origins.

jivanpal commented 2 years ago

Thanks, I will try to add this tomorrow evening.

jivanpal commented 2 years ago

@akira-okumura You can now edit your script that runs drat recover, adding the option --skip-multilinked-inodes. This will recover file data for inodes that only have one filepath, but create empty files for any inodes that have multiple filepaths.