nicoboss / nsz

NSZ - Homebrew compatible NSP/XCI compressor/decompressor
Other
1.38k stars 91 forks source link

[Bug] Using the --remove-padding on a No-intro dump result in a mismatch #150

Open Immersion95 opened 1 year ago

Immersion95 commented 1 year ago

I have this NSP dump "Azure Striker GUNVOLT STRIKER PACK" that matches the No-intro one.

I wanted to verify if a compression and decompression would result in the same CRC and it doesn't.

Using the -remove-padding option leads to a totally different CRC, only way to get the correct one is without -remove-padding.

I would like to find a way to convert my valid NSP to the No-intro standard and I thought this program would help. Is there something I'm not doing right ?

nicoboss commented 1 year ago

If you already have a dump matching the No-Intro standard don't use the --remove-padding option. Bit identical recreation is the default behavior for NSP/NSZ.

The remove padding option was originally intended to make files compressed with older NSZ version No-Intro compliant but since the last release this option also removes the FileEntryTable padding which kind of goes exactly against the originally intend of this command line option as it breaks No-Intro compliancy for some games.

I will make sure to change the description of the --remove-padding option or add a separate option to remove the FileEntryTable padding.

Immersion95 commented 1 year ago

Thanks for your answer.

What I'm trying to do is find a way to convert NSPs with the same internal contents to the No-Intro NSPs. It may be out of the scope of this program though. Maybe a NSP standard should exist as I see a lot of NSPs that have the same contents but different CRCs.

nicoboss commented 1 year ago

Thanks for your answer.

What I'm trying to do is find a way to convert NSPs with the same internal contents to the No-Intro NSPs. It may be out of the scope of this program though. Maybe a NSP standard should exist as I see a lot of NSPs that have the same contents but different CRCs.

You could dump a few of them as No-Intro and use a tool like the free version of HexCmp2 (https://www.fairdell.com/) to see what is different. If there is just some empty space between the header and the first section downgrade to NSZ v4.4.0 and try the --remove-padding option there. If this also doesn’t work let me know and maybe I can help you with removing/adding those empty spaces.

You are probably out of luck if your original dump where not made using nxdumptool with the No-Intro option enabled. No-Intro contains a lot more than what normally gets dumped. It's likely unavoidable that you dump all your games again using the No-Intro option enabled. Once you do so you can safely use NSZ to compress/decompress them while keeping them bit-identical. If you find one where the hash prior to compression and after decompression doesn’t match open another issue.

Immersion95 commented 1 year ago

Thanks for your answer. What I'm trying to do is find a way to convert NSPs with the same internal contents to the No-Intro NSPs. It may be out of the scope of this program though. Maybe a NSP standard should exist as I see a lot of NSPs that have the same contents but different CRCs.

You could dump a few of them as No-Intro and use a tool like the free version of HexCmp2 (https://www.fairdell.com/) to see what is different. If there is just some empty space between the header and the first section downgrade to NSZ v4.4.0 and try the --remove-padding option there. If this also doesn’t work let me know and maybe I can help you with removing/adding those empty spaces.

You are probably out of luck if your original dump where not made using nxdumptool with the No-Intro option enabled. No-Intro contains a lot more than what normally gets dumped. It's likely unavoidable that you dump all your games again using the No-Intro option enabled. Once you do so you can safely use NSZ to compress/decompress them while keeping them bit-identical. If you find one where the hash prior to compression and after decompression doesn’t match open another issue.

Just did what you suggested, the 4.4.0 actually produces the same file (CRC) as the 4.5.0 with the remove-padding option. I compared the original No-Intro file and the processed NSP with HexCmp2 and it's entirely different and not even the same size. The processed one has 32 Ko less.

Immersion95 commented 1 year ago

While spotting the difference, it seems this is not that big.

First : Capture d'écran 2023-11-08 213116

Second : The empty space from the No-intro dump at 00000210-00007FF0 is deleted.

The rest seems to be the same.

Edit : I made those 2 edits in the processed file from NSZ 4.4.0 with HxD and it produced the same CRC as the original No-Intro file.

Immersion95 commented 1 year ago

Just wanted to add that this is likely pointless as there is no proof that No-Intro is using a standardized NSP creation.

nicoboss commented 1 year ago

Thanks a lot for posting your findings. That's really interesting so the No-Intro version does have padding in this case. I always thought they remove padding. In the end I just made NSZ keep the padding as it is by default to make bit identical recreation possible in any case.

All but the first differences in the screenshot are caused by sector offsets being different due to different PFS0 header to first section padding. The very first different on the screenshot means that there is also a different StringTable padding.

Maybe I should replace the option --remove-padding with an option --padding VALUE where one can specify the amount of bytest of padding there should be between the PFS0 header and the first section. I probably should also add an option to specify the StringTable padding.

I'm not sure if the No-Intro standard always uses the same paddings. If they do and we figure out the exact padding values they use we could add a command line option to use exactly those. Given that nxdumptool probably does the same thing every time them all having the same padding values would make sense.

I will implement a --padding VALUE command line argument where value is in bytes and a --string-table-padding VALUE command line argument where value specifies to which modulo should be padded if you are interested. I assume it should be 0x20 for No-Intro based on your screenshot. “2” is 0x32 where “@” is 0x40 which is either the next 0x10 modulo or 0x20 modulo when starting from 0x32. I assume it’s the 0x20 one because that’s the one mentioned on switchbrew.

In case you wonder here the PFS0 file format specifications: https://wiki.oatmealdome.me/index.php?title=PFS0_(File_Format) https://switchbrew.org/w/index.php?title=NCA

Immersion95 commented 1 year ago

Thanks for your thorough response !

Here is also a tool from the nxdumptool author which might help https://github.com/DarkMatterCore/dom_xml_dataset_generators/commit/ef8a4ef304fa3de9f4a0cc967d20121559fc035c

nicoboss commented 1 year ago

Thanks for your thorough response !

Here is also a tool from the nxdumptool author which might help DarkMatterCore/dom_xml_dataset_generators@ef8a4ef

Thanks! This line seams to confirm the string table padding is 0x20 alleged. What's interesting that when it's already alleged, they pad to a full 0x20 bytes to reach the next 0x20 alignment: https://github.com/DarkMatterCore/dom_xml_dataset_generators/blob/ef8a4ef304fa3de9f4a0cc967d20121559fc035c/cdn2nsp.py#L509C35-L509C35

PFS_FULL_HEADER_ALIGNMENT: int = 0x20
padded_header_size = (utilsAlignUp(header_size + 1, PFS_FULL_HEADER_ALIGNMENT) if utilsIsAligned(header_size, PFS_FULL_HEADER_ALIGNMENT) else utilsAlignUp(header_size, PFS_FULL_HEADER_ALIGNMENT))

To doublecheck I also looked at the nxdumptool source code which conteined the same logic. https://github.com/DarkMatterCore/nxdumptool/blob/1445faf17f1baaf74486dccd31789888382eb523/source/core/pfs.c#L349C5-L349C41

#define PFS_HEADER_PADDING_ALIGNMENT    0x20
padded_header_size = (IS_ALIGNED(header_size, PFS_HEADER_PADDING_ALIGNMENT) ? ALIGN_UP(header_size + 1, PFS_HEADER_PADDING_ALIGNMENT) : ALIGN_UP(header_size, PFS_HEADER_PADDING_ALIGNMENT));

Knowing that I can just implement a command line argument which applies this exact string table padding.

I haven't found any information about the amount of padding they do between the PFS0 header and the first section. Can you check on your No-Intro dump the offset of the first section and see if it’s always the same. You can check the section offset. If it’s always the same I will implement a --fix-padding option that makes it use their string table padding rules and makes thew first section start on whatever value we determine.

Immersion95 commented 1 year ago

I used the author program to generate valid NSP from CDN content and I couldn't obtain the original No-Intro dump CRC. The author told me that "The No-Intro NSP dat hasn't been updated yet to accurately reflect checksums from NSPs generated by cdn2nsp.".

I also compared different dumps from No-Intro and it differs a lot, some have offsets, others don't so that explains it.

nicoboss commented 1 year ago

I used th author program to generate valid NSP from CDN content and I couldn't obtain the original No-Intro dump CRC. The author told me that "The No-Intro NSP dat hasn't been updated yet to accurately reflect checksums from NSPs generated by cdn2nsp.".

I also compared different dumps from No-Intro and it differs a lot, some have offsets, others don't so that explains it.

Oh no so the offset of the first section differs a lot. This is really bad as the amount of padding between the PFS0 header and the first section is totaly arbitrary and to my knowledge can't be obtained from anywhere. If one decides to standardize the structure of NSP file this padding need to be defined or better just don't put any empty space between the PFS0 header and the first section as there is absolutely no reason to do so.

The 0x20 modulus PFS0 header padding (also known as string table padding) can stay as I think we all agreed on that one.

DarkMatterCore commented 1 year ago

Full header alignment to a 0x20-byte boundary is used by Nintendo, which is why it exists in both nxdumptool and cdn2nsp. No additional padding is used, not even between written file entry data.

Keep in mind this applies to the full PFS header including the populated name table, not just to the name table itself (which could already have a size aligned to 0x20).

Furthermore, all name table strings must be NULL-terminated, which means each new string will essentially start at prev_str_offset + prev_str_len + 1. I'm mentioning this because I have seen NSZ -> NSP conversions that don't follow this rule, thus failing to be processed by tools such as hactoolnet -- dunno if this has been fixed.

nicoboss commented 1 year ago

Full header alignment to a 0x20-byte boundary is used by Nintendo, which is why it exists in both nxdumptool and cdn2nsp. No additional padding is used, not even between written file entry data.

Keep in mind this applies to the full PFS header including the populated name table, not just to the name table itself (which could already have a size aligned to 0x20).

Furthermore, all name table strings must be NULL-terminated, which means each new string will essentially start at prev_str_offset + prev_str_len + 1. I'm mentioning this because I have seen NSZ -> NSP conversions that don't follow this rule, thus failing to be processed by tools such as hactoolnet -- dunno if this has been fixed.

Thanks a lot for confirming the 0x20 padding. All strings in my string table are NULL-terminated since I fixed #151.

@DarkMatterCore Do you know what logic applies regarding the padding between the end of the PFS0 header and the start of the first file? I saw everything from the first file directly being after the PFS0 header to there being a lot of empty space in between. How does nxdumptool decide at what offset to put the first file when the No-Intro option is enabled?

DarkMatterCore commented 1 year ago

The absolute offset for the first file entry always matches the full PFS header size with 0x20 padding (e.g. if the PFS header size is 0x161, then its full padded size will be 0x180, which means the file data will begin at that offset). The relative offset in the file entry table from the PFS header is always zero.

nicoboss commented 1 year ago

The absolute offset for the first file entry always matches the full PFS header size with 0x20 padding (e.g. if the PFS header size is 0x161, then its full padded size will be 0x180, which means the file data will begin at that offset). The relative offset in the file entry table from the PFS header is always zero.

Thank a lot for your answer. Everything is clear to me now.

Just to make sure I'm getting everything right:

padded_header_size = (utilsAlignUp(header_size + 1, PFS_FULL_HEADER_ALIGNMENT) if utilsIsAligned(header_size, PFS_FULL_HEADER_ALIGNMENT) else utilsAlignUp(header_size, PFS_FULL_HEADER_ALIGNMENT))

If the NULL-terminated string table and so the end of the PFS0 header is already 0x20 aligned you add a full 0x20 bytes of padding while otherwise you just pad the remaining bytes to make it 0x20 aligned.

DarkMatterCore commented 1 year ago

@nicoboss That's right, yes.

nicoboss commented 11 months ago

@Immersion95 A fix for this is now deployed in the latest NSZ 4.6.0 release.

For you and others that really like the No-Intro standard the latest release is likely huge as --verify now uses file-level NSP sha256 hash validation. So --verify guarantees that the sha256 hash of the original file and the decompressed file will match and so the file is bit-identically recreated. Because it now validates the entire file any issues including wrong offsets will be automatically detected.

Immersion95 commented 11 months ago

I tested SMBW with a NSP dump. I used your tool and the one from DarkMatterCore, the resulting CRC are still different.