lack of catalog support

keithgh1 commented 2 months ago

As a result of the lack of catalog support, there's no way for the script to restore files into their original directories. For large backups, this could be an issue, require a bunch of manual steps to recreate, etc.

If the backup contains duplicate filenames across different directories, the scripts current behavior will overwrite existing files.

There will need to be some attention paid to the fact that the reason many people use this script is because the catalog is defective. If it's defective then good error handling, recovery, etc need to be put into place to deal with this. Because I don't know yet what the structure looks like, there could be limitations in dealing with defective catalogs.

If the catalog is still intact, then users can always use original quarterback software to attempt a restore.

keithgh1 commented 2 months ago

hey @merlin555, with the .ADF you provide in issue #1 , can you please provide a screenshot or list of directory names present within your catalog?

merlin555 commented 2 months ago

@keithgh1

Does it have to be from the quarterback, or can it also be from the finished virtual directory? Unfortunately, taking a screenshot with "scrolling" doesn't work with the software I use with WinUAE. ShareX and FastStone Capture. If you can wait a bit, I will send you the complete list as soon as I have copied all the files into the correct folder.

Here are a few from Quarterback: 01 C 01 01 C 02 01 C 03 01 C 04 01 C 05 02 DEVS 00

keithgh1 commented 2 months ago

No, this is great. I just need some text strings to search on and look for. Thanks @merlin555 .

merlin555 commented 2 months ago

@keithgh1

No problem. Just ask me if I can be of further help.

keithgh1 commented 2 months ago

So I've been putting quite a bit of effort in lately to try and figure out how the catalog works. One major issue was that the directory names were not plainly present in the backup file, unlike the filenames. Hard to parse and process that what you can't see.

With help from reddit, we've determined that encryption is used on the catalog, even when not enabled/used on the files themselves. I've written working python to successfully decrypt the catalog. So now directory names and filenames are visible!

For instance, @merlin555 , I can see Dragon's Harddisk plain as day. :)

The decryption key, which is a sad 8-bits, is stored in plaintext in offset 0xD at the beginning of the file. The source file Monitor.c in QB contains the necessary method and decryption table.

Now there's still a leap from being able to read the dirs to reconstructing them. I've got to wrap my head around how the hierarchy works. There's some type of parent/child relationship, pointers to other objects, etc.

Having access to the source code is fantastic, but there's still unpacking of structures, concepts, how things connect, etc to do.

Stay tuned!

merlin555 commented 2 months ago

@keithgh1

Thank you for your great effort.

keithgh1 commented 2 months ago

hey @merlin555 any chance you could provide the last disk in the backup set, that contains the backup catalog?

A corrupt catalog likely causes subsequent entries to be placed in the wrong directories. The catalog contains DirFib entries for FILES, DIRECTORIES, and occasionally links, processed sequentially without "end directory" markers. For a DIR entry, the correct number of child DirFib entries must follow before the directory path shifts back to the parent. Corrupted entries disrupt this count, and if entry type flags are also corrupted, files may end up in the wrong directories.

There appears to be some catalog validation during restoration, but no manual method exists to force using the backup catalog. If the primary catalog is corrupted yet undetected, it can halt the restore process entirely.

I'm adding an option to my software to select the catalog manually, with a fallback to write all files in the same directory if necessary. As long as the individual files aren’t corrupted, this ensures the file data remains intact, even if compressed files are more vulnerable.

It seems like an oversight not to store paths in individual file headers alongside the catalog. This would safeguard proper file placement even if the catalog is corrupted, though it would add some overhead, especially with deep paths on small files.

merlin555 commented 2 months ago

@keithgh1

Here comes the last disk from the backup. Would still have a backup with the catalog intact if you need it.

Amiga.zip

keithgh1 commented 2 months ago

Ok good news. I've got basic catalog support implemented, although it's still in prototype and not ready for public consumption.

merlin555_catalog

What you're seeing is a bunch of fonts that should be stored in the fonts directory, but because of corruption in the directory stack, I reset to the root. This is the safest way of keeping things relatively organized.....

I have to postpone backup catalog support because I can't seem to properly decrypt it, although I'd think the method would be the same as the primary.

So I'm adding options for ignoring the catalog (which is current function), or primary catalog. I need to cleanup the code and get it polished more a bit, but I expect to committing the new version soon.

merlin555 commented 2 months ago

Good to hear. Thanks.

keithgh1 commented 1 month ago

Ok, the commit 19ea0d9 adds primary catalog processing support.

@merlin555 if you have a chance, can you please try this out?

I will close this ticket soon as this commit should address this issue.

keithgh1 commented 1 month ago

Catalog support has been added. If there are bugs with that support, then we should open separate issues specific to those bugs.

merlin555 commented 1 month ago

@keithgh1

Great! Functions. With the "--catalog ignore" as you mentioned, only all files from folder C without a folder structure.

keithgh1 commented 1 month ago

Thanks for the feedback @merlin555 . Glad it works. I'll need to do some work increasing the robustness in general against errors, but I have separate tickets for that.

Fwiw, the reason why only the C files are written in the main directory is because only those are stored in the first .adf.

It occurred to me after I posted my message is that runs against non-catalog disks without --catalog ignore will still work, it just won't find catalog data, won't match the files up, and dump them into the generic folder.

merlin555 commented 1 month ago

@keithgh1

Okay.

keithgh1 / AmigaQB_extract

lack of catalog support #2