hozuki / libcgss

libcgss is a helper library for THE iDOLM@STER Cinderella Girls Starlight Stage (CGSS/DereSute/デレステ). It currently supports HCA audio decoding and ACB exploring. It also applies to other games like THE iDOLM@STER Million Live! Theater Days (MLTD/MiriShita/ミリシタ).
Other
94 stars 8 forks source link

Failed decryption and inconsistencies between libcgss and DereTore #7

Closed Bagoum closed 5 years ago

Bagoum commented 5 years ago

I've been using this library and DereTore for extracting sound files from Shadowverse (another Cygames project which also uses the key 00003657 F27E3B22). I was updating my scripts for the new CriWare setup, but I had some errors with file decryption. Most files work fine either way, but I found some files which error on one or the other method.

I was able to decrypt this with DereTore but libcgss picks up the wrong number of internal wav files and outputs garbage: (there are four wav files and they sound something like a sick elephant) vo_900841090.zip

I was able to decrypt this with libcgss but DereTore outputs an index access error: (there is one wav file with someone saying "Good work") vo_708_000_004.zip

hozuki commented 5 years ago

I'll look into this. But I'm afraid I don't have time until next week at least.

Here's my initial guesses:

  1. Something is wrong with ACB reader in libcgss; not sure about the HCA part.
  2. Out-of-range error usually means invalid state during decoding (e.g. given wrong decryption keys). DereTore has strict index range check (obviously) but libcgss does not. In other words, in common cases you can access OOR block data as long as it is in process memory, but you'll never know. And if those data happens to store next to each other, an output with inaudible errors might be generated. In a "normal" decoding process, value lookup should not trigger OOR access. So maybe the input(s), or some code somewhere caused it. I assume the latter one. It is hard to debug though.
hozuki commented 5 years ago

During the first tests I found that this bug is like a ghost. The debug target is apparently acb2wavs. When I use Cygwin toolchain, and execute inside the IDE, it returns code 127 running in "Run" mode and 0 in "Debug" mode. In either situation, both ACBs are correctly decoded. But if I copy the executable to somewhere else and run with the same arguments, it crashes when decoding the 2nd (i.e. index 1) waveform in vo_708_000_004. However it can still decode vo_900841090. When I use other toolchains, it acts the same with the second case, crashes on vo_708_000_004 and succeeds on vo_900841090. The return code is 0xC0000005 (access violation).

I doubted if the decipher function was only a partial one so I checked the intermediate states against AtomViewer. Everything is the same. I also thought about if this raised some unfound bugs inside any of the runtimes. Also, Cygwin 2.x and 3.0 seems to yield the same result.

At last by debugging on Visual C++ it turned out to be incorrect addressing of this[1]. Rewriting all methods into static solves this issue (67cb7c4e207963732bb76a2ce542e3767c1b8b66). But I'm not clear why it happens, because this[1] is a valid syntax and the result should point to the next object (type CHcaChannel). Notice that CHcaChannel does not have virtual functions so the pointer should have moved like visiting a struct. But the pointer moved more than it should.

About DereTore I'll look into it later. It uses bounded array + index so that's stricter constraint, which may not be 100% compatible with the C/C++ version.

hozuki commented 5 years ago

I found the cause of the exception in DereTore. When decoding vo_708_000_004, just after the last AddBits operation of CHcaData (or DataBits, in DereTore) makes the object enter an invalid state. More precisely, this line, where the data contains 204 bytes and it tries to access the 205th (i.e. index=204) byte.

Usually these accesses do not go out of range. I guess that's why the original code does not check the index. But in a rare case it does go into the illegal zone. In C/C++, this is direct memory access and in most of the cases the address to access is still inside the allocated zone of the process address space. Then this problem is hindered because accessing that address does not throw illegal memory access error, and a decoding error in the last byte of the wave data is not usually audible. However in C# array access is strictly checked and then this problem is revealed.

To solve this problem I commited 04942319a53169a55b26414f92af3d6d53e876b6 in this repo and OpenCGSS/DereTore@e104bc6d6d6d3f3173836dc5693c1def2c863729 in DereTore. They are not published as releases yet but you can download them through CI autobuilds.

In DereTore I also changed the decoding process using pointers in C#. It should then be more similar to the C/C++ version in memory layout; but the drawback is access errors is harder to detect.

hozuki commented 5 years ago

Can you also test in your environment and confirm the issue solved?

Bagoum commented 5 years ago

I tested both libcgss and DereTore, both are capable of extracting all the files I was working with. I don't think there are any garbage output errors but I only checked a few. I had to build them both (Windows10/VS2017); the libcgss release still produced occasional errors (apparently the commits from this morning fixed something). Issue seems to be resolved.