unknownbrackets / maxcso

Fast cso compressor
ISC License
390 stars 23 forks source link

Documentation of the dax and jso formats? #61

Closed JoseAaronLopezGarcia closed 2 years ago

JoseAaronLopezGarcia commented 2 years ago

Is there any available documentation for the .dax and .jso file formats? I'm looking to implement them into ARK CFW but can't find any details other than the compression method.

unknownbrackets commented 2 years ago

For DAX, I mostly read their code to figure out the format. I don't remember finding much in the way of documentation.

For JSO, it's not even supported mainly because it was undocumented AND there wasn't really source to look at. I'm sure I could figure it out, but didn't seem worth it.


DAX basically has this header: https://github.com/unknownbrackets/maxcso/blob/a4b6f868c21741b5925c318033f8df71a7f1598b/src/dax.h#L13-L19

Immediately following that, there's a table of 32-bit values in the same fashion as CSO. Blocks are always a fixed size, 0x2000. The entire 32-bit value is the position of the block, with no special handling of the top bit like in CSO.

Next up, it has another table of 16-bit values giving the compressed sizes of each frame. CSO skips this, with the size implied by the next index value.

After that, for version >= 1, it has "NC areas". This is represented as a pair of 32 bit ints per the count in the header. The first of each pair is the starting frame, and the second the count of frames that are "Not Compressed." Essentially, this works as the upper bit did in CSO: for the specified ranges of frames, they're uncompressed.

Each block in DAX is compressed with the zlib header, which means two bytes of every block are wasted. CSO elides this header.

The general theme here is a lot of predictable values are wasted, increasing the size of the file. Its only benefit is the block size being larger. A CSO with the same block size simply has superior compression.

-[Unknown]

JoseAaronLopezGarcia commented 2 years ago

Where is that source code you mentioned? I'm trying to implement dax in a way that I can reuse the same code for cso/zso, would you think this is difficult and it's best to make a new reader just for dax? Or can the csoreader found in inferno iso driver be modified to be compatible with dax? I have never done anything with this format and I can never find any sample code online except this repo (which has helped a lot to understand the format but I still have some doubts if the route I'm taking is correct or not). Thanks!

JoseAaronLopezGarcia commented 2 years ago

From what I could find online about jso, it's a mix of dax and cso, so perhaps it uses the cso format with the dax block size and compression? This is pure speculation but if so it may be easy to implement (easier than dax at least). Will have to check it out. (Sorry for double post).

unknownbrackets commented 2 years ago

I believe I'd found a copy of "DAX Creator 0.3 (by Dark_AleX)" on some forum which included source code.

The formats and headers are different, so I'm pretty confident that a CSO reader won't just magically read a DAX or JSO file, unless someone implemented this support specifically. I'm sure ProCFW, at least, only supported CSO (in part because the authors of DAX and JSO stopped really supporting them with tools.)

If JSO is a mix, it's important to know what mix. At the very least, you would need a JSO compressed file as well as the original ISO it represents. Then you could potentially reverse engineer the file format. I'm mostly sure it just uses zlib for compression, which might be the sole reason behind a claim that JSO is a mix of DAX and CSO.

-[Unknown]

JoseAaronLopezGarcia commented 2 years ago

I'm already pretty far at implementing DAX, having a few memory overflow issues here and there but pretty far. As for JSO I found the details:

#define JISO_MAGIC 'JISO' // JISO

typedef struct _JisoHeader {

uint32_t magic; // [0x000] 'JISO' uint8_t unk_x001; // [0x004] 0x03? uint8_t unk_x002; // [0x005] 0x01? uint16_t block_size; // [0x006] Block size, usually 2048. // TODO: Are block_headers and method 8-bit or 16-bit? uint8_t block_headers; // [0x008] Block headers. (1 if present; 0 if not.) uint8_t unk_x009; // [0x009] uint8_t method; // [0x00A] Method. (See JisoAlgorithm_e.) uint8_t unk_x00b; // [0x00B] uint32_t uncompressed_size; // [0x00C] Uncompressed data size. uint8_t md5sum[16]; // [0x010] MD5 hash of the original image. uint32_t header_size; // [0x020] Header size? (0x30) uint8_t unknown[12]; // [0x024]

} JisoHeader;

It works similarly to CSO and DAX:

Other than that I am implementing DAX in a way that the code that handles it is different and abstracted from CSO/ZSO code. I will most likely not implement JSO since the format was never actually used by anyone, and the newer ZSO format is simply superior in every way.

unknownbrackets commented 2 years ago

Well, ZSO is only lz4, so it's generally weaker compression than CSO. There are some blocks lz4 will win on, but it will lose most blocks.

ZSO was initially created because a few games (GTA was one in particular) had performance problems on PSP CFW, only when using CSO. At the time, it was believed that the DEFLATE decompression overhead was the cause.

When lz4 was implemented, it ended up not fixing the issue at all, and the author of ZSO support in PSP CFW discovered that it was an IOPS issue - using an ISO to read 64 KB from a file would cause one 64 KB read, but using a CSO (or ZSO) it would cause something around thirty-two 4 byte reads and another thirty-two 2 KB reads. The overhead of the reads was the cause. This issue was fixed and resolved the problem for both ZSO and CSO.

As such, I view ZSO as largely a failed experiment. I added it mainly so people could test to see if it had any real benefits. It helps slightly on a 222 Mhz CPU, but the difference is pretty insignificant on any CPU with 1000 Mhz or more.

-[Unknown]

JoseAaronLopezGarcia commented 2 years ago

Yeah there was a fix for this by using an index cache (cache block indexes to reduce accessing cso file). A similar technique I've applied for DAX when decompressing ISO sectors. Since most reads are sequential, and a DAX block equals 4 ISO sectors, then it will most likely happen that you already have decompressed the DAX block for that specific ISO sector (since it was done on previous ISO sector read).

I have a question though; when you say that the decompression of cso and dax differs in that cso uses raw deflate but dax uses deflate via an interface, how exactly does this translate to PSP code?

The most obvious solution that I've implemented is to just use zlib with uncompress() or z_stream. But either of them give me a lot of memory issues (to be expected if the buffers and decompression library have to reside in kernel ram for inferno). For CSO it was as simple as using Sony's already built-in deflate decompress function (sceKernelDelfateDecompress), but this function doesn't seem to work with DAX blocks, or is there a way to make it work? Is there any source code available for a DAX reader plugin I can take a look at? I'm pretty sure I got all the algorithms working, but using zlib causes a memory problem.

unknownbrackets commented 2 years ago

This is a decent explanation: https://stackoverflow.com/questions/10166122/zlib-differences-between-the-deflate-and-compress-functions

DAX includes the 2 byte header and the four byte CRC at the end. So you can simply skip the first two bytes (which should always be the same two values for every single block) and treat it as the same data as CSO. That's why I said it's wasteful.

maxcso reads DAX, but uses inflate() to decompress. You can see it uses the negative value (meaning skip zlib header/trailer) only for CSO, rather than doing any byte skipping: https://github.com/unknownbrackets/maxcso/blob/a4b6f868c21741b5925c318033f8df71a7f1598b/src/input.cpp#L409

-[Unknown]

JoseAaronLopezGarcia commented 2 years ago

Thank you very much. Yes I figured out you could skip the first two bytes of the DAX block and treat the same as a cso block (not even need to skim the last 4 cycle bytes, they will be ignored anyways). I have fully working source code and a full working implementation for Inferno that handles both DAX and JISO, including the block index cache that codestation made (which btw made a huge difference as you pointed out). I also have PC-side code that might be useful for you to implement JISO on maxcso (which btw is not a bad format from my experience).

I still do have one big issue with DAX though and it seems the fact that it has an 8KB block size causes memory issues (random memory corruption). Do you know if perhaps I'm doing it wrong and I shouldn't be processing full DAX blocks at a time?

JoseAaronLopezGarcia commented 2 years ago

I'm attaching source code for PC that reads DAX and JSO files. main.txt To compile: gcc -o main main.c -lz -llz4 -llzo2

It takes one argument: path to iso/jso/cso/dax/zso, and it outputs a few files, mainly the icon0.png and the first 8MB starting from ISO magic.

JSO is simple: Header like every other format Block index array, with extra index to calculate last block size like in cso. To detect uncompressed blocks it can use any of the methods that dax or cso uses (ncarea, top bit). However jiso prefers to use a much simpler way to determine if a block is not compressed: if the size of the compressed block is the same as the uncompressed block. The header has some interesting info, but for inferno, only the first 12 bytes are actually important, all of them up to uncompressed size. The method determines compression algorithm (lzo default or zlib), while block header determines if blocks have an extra 4 byte header (doesn't by default and can be skipped).

Additionally, you can find the inferno reader code for dax/jiso here: https://github.com/PSP-Archive/ARK-4/blob/ee11e9d04dd5981f1183b4add2e4ca3f002fad82/core/inferno/isoread.c#L712

JoseAaronLopezGarcia commented 2 years ago

Hey there, I've implemented experimental support for CSOv2 on inferno following the documentation I found here. The format seems promising as it has a good compromise between speed and compression (GTA LCS in CSOv2 is even smaller than DAX, and if using LZ4 in critical sections, it should be blazingly fast to decompress). You can find latest beta compilations in the release page of ARK-4 if you are interested in trying it out. I am yet to fix some issues though, GTA LCS crashes just when it's about to enter the loading screen.

JoseAaronLopezGarcia commented 2 years ago

This issue has been solved. I'm opening a new one to focus on CSOv2.

CyberGonzo commented 1 year ago

So you can simply skip the first two bytes (which should always be the same two values for every single block)

I just tried maxcso on an ISO --format=dax to generate a dax to try an implementation that reads dax but found that many of the blocks/blobs/frames/chunks didn't have the expected header. I checked possible headers here: https://stackoverflow.com/questions/9050260/what-does-a-zlib-header-look-like For all intents and purposes those blocks appeared 'raw' (no known headers) and inflating them 'raw' (-15) actually confirmed they were. Perhaps a shortcoming in the maxcso code ? Not sure if this helps ?

unknownbrackets commented 1 year ago

For all intents and purposes those blocks appeared 'raw' (no known headers) and inflating them 'raw' (-15) actually confirmed they were. Perhaps a shortcoming in the maxcso code ? Not sure if this helps ?

Hm, that must be a bug. Does the same happen if you use --no-libdeflate? I think I forgot to account for including the header in libdeflate trials, so it probably has no header unless 7-zip or zlib win. Will look.

-[Unknown]

unknownbrackets commented 1 year ago

Should be better in 528c69bf5. Note that libdeflate is now off by default so you would need to pass --use-libdeflate. Se 161f99d8 for that.

-[Unknown]