psthrn42 / SCVI_Extract

19 stars 6 forks source link

Try to include further schemas #4

Open FullLifeGames opened 1 year ago

FullLifeGames commented 1 year ago

@psthrn42 apparently there are schemas already created for Legends Arceus which might help use here:

https://github.com/rh-hideout/rhh-docs/tree/main/NX/flatbuffers/LA

Will be going to bed, maybe I'll have a look in the morning.

psthrn42 commented 1 year ago

Yep, there are lots of schemas from previous games. Saw your pull request, how are you planning to detect which files to use which schemas on, when the files inside the trpak's don't have filenames?

FullLifeGames commented 1 year ago

Yep, there are lots of schemas from previous games. Saw your pull request, how are you planning to detect which files to use which schemas on, when the files inside the trpak's don't have filenames?

So for me, the extraction of the trpfs and trpak files also resulted in filenames. I guess your script works better than expected :D

bWFpbA commented 1 year ago

Yep, there are lots of schemas from previous games. Saw your pull request, how are you planning to detect which files to use which schemas on, when the files inside the trpak's don't have filenames?

So for me, the extraction of the trpfs and trpak files also resulted in filenames. I guess your script works better than expected :D

How many files did that extract? There should be a total of 273873 files as of version 1.0.1.

Yep, there are lots of schemas from previous games. Saw your pull request, how are you planning to detect which files to use which schemas on, when the files inside the trpak's don't have filenames?

TRPAK files do have file names, but they are stored within the code. See these three, for example: fnv1a64('avalon/data/personal_array.bin') fnv1a64('avalon/data/tokusei_array.bin') fnv1a64('avalon/data/waza_array.bin') The resulting hashes should be within the TRPFD hash table, and also within the TRPAK hash table.

FullLifeGames commented 1 year ago

How many files did that extract? There should be a total of 273873 files as of version 1.0.1.

My bad then, I only get around 17000 files. I was then too overexcited, since the language tables are present in e.g. arc\messagedat\English\common\ and can be extracted.

Examples are stuff like monsname.tbl

bWFpbA commented 1 year ago

How many files did that extract? There should be a total of 273873 files as of version 1.0.1.

My bad then, I only get around 17000 files. I was then too overexcited, since the language tables are present in e.g. arc\messagedat\English\common\ and can be extracted.

Examples are stuff like monsname.tbl

monsname.tbl is an AHBT file that should only contain fnv1a64 hashes and names. monsname.dat is what you want, and from the sounds of it, that wasn't extracted? Or are the TBL and DAT files appended together...?

FullLifeGames commented 1 year ago

monsname.tbl is an AHBT file that should only contain fnv1a64 hashes and names. monsname.dat is what you want, and from the sounds of it, that wasn't extracted? Or are the TBL and DAT files appended together...?

That makes a lot of sense. When I compared it with the SWSH dump, I was wondering why there was no ".dat" file. Since some of them are extractable (modified pkNX a bit), I assume then they are appended together or something?

Anyway, seems like this repo needs a lot more work on the extraction side.

bWFpbA commented 1 year ago

monsname.tbl is an AHBT file that should only contain fnv1a64 hashes and names. monsname.dat is what you want, and from the sounds of it, that wasn't extracted? Or are the TBL and DAT files appended together...?

That makes a lot of sense. When I compared it with the SWSH dump, I was wondering why there was no ".dat" file. Since some of them are extractable (modified pkNX a bit), I assume then they are appended together or something?

Anyway, seems like this repo needs a lot more work on the extraction side.

Essentially, how the format works is that the code hashes a file path, that hash is then found in the trpfd hash table, then the trpfd map table is read to obtain the index into the trpfd key table and file table. The code then hashes the key from the key table, that hash is then found in the trpfs hash table, the associated offset vector value is then obtained, and then the size from the file table is used for reading. Lots of overcomplicated bullshit, when they could just get rid of the fnv1a64 hashes...

bWFpbA commented 1 year ago

Ugh, and I even forgot about then searching the trpak hash table for the file path hash, and then decompressing the file data using Oodle...

FullLifeGames commented 1 year ago

Essentially, how the format works is that the code hashes a file path, that hash is then found in the trpfd hash table, then the trpfd map table is read to obtain the index into the trpfd key table and file table. The code then hashes the key from the key table, that hash is then found in the trpfs hash table, the associated offset vector value is then obtained, and then the size from the file table is used for reading. Lots of overcomplicated bullshit, when they could just get rid of the fnv1a64 hashes...

Since I did not write the extraction for the trpfd and trpfs, I only kinda get where you are coming from. I find the file hashes and the keys, but what I am stuck at are where the filename for the 273873 files should be located. For the 16112 trpak files, this is doable but the rest seems ... not really available.

bWFpbA commented 1 year ago

Essentially, how the format works is that the code hashes a file path, that hash is then found in the trpfd hash table, then the trpfd map table is read to obtain the index into the trpfd key table and file table. The code then hashes the key from the key table, that hash is then found in the trpfs hash table, the associated offset vector value is then obtained, and then the size from the file table is used for reading. Lots of overcomplicated bullshit, when they could just get rid of the fnv1a64 hashes...

Since I did not write the extraction for the trpfd and trpfs, I only kinda get where you are coming from. I find the file hashes and the keys, but what I am stuck at are where the filename for the 273873 files should be located. For the 16112 trpak files, this is doable but the rest seems ... not really available.

Those file names are stored within the code, not anywhere in the data filesystem. If you uncompress the main file found in the exefs, you should be able to find the strings I mentioned: avalon/data/, personal_array.bin, tokusei_array.bin and waza_array.bin.

bWFpbA commented 1 year ago

Here are the flatbuffer schemas I created while looking into this: https://anonfiles.com/y347n7H8yf/trinity_7z Since the trpfd/trpak hashes are only in the code, I'd highly recommend just extracting the data like so: <sanitized trpfs key>/<trpak hash>. At least until someone runs over all the strings in the code and creates a list of correct file paths.

FullLifeGames commented 1 year ago

Those file names are stored within the code, not anywhere in the data filesystem. If you uncompress the main file found in the exefs, you should be able to find the strings I mentioned: avalon/data/, personal_array.bin, tokusei_array.bin and waza_array.bin.

Sorry that I have to keep asking, but this is all rather new to me. So what I know is that you can load the exefs main file into Ghidra and modify it there. I did not know that there are ways to compress the contents from it.

Found something like this https://github.com/0CBH0/nsnsotool but it at least does not seem to go the full way.

Here are the flatbuffer schemas I created while looking into this: https://anonfiles.com/y347n7H8yf/trinity_7z Since the trpfd/trpak hashes are only in the code, I'd highly recommend just extracting the data like so: <sanitized trpfs key>/<trpak hash>. At least until someone runs over all the strings in the code and creates a list of correct file paths.

The flatbuffer schemas match with the ones we use here as well, so this is good.

bWFpbA commented 1 year ago

Those file names are stored within the code, not anywhere in the data filesystem. If you uncompress the main file found in the exefs, you should be able to find the strings I mentioned: avalon/data/, personal_array.bin, tokusei_array.bin and waza_array.bin.

Sorry that I have to keep asking, but this is all rather new to me. So what I know is that you can load the exefs main file into Ghidra and modify it there. I did not know that there are ways to compress the contents from it.

Found something like this https://github.com/0CBH0/nsnsotool but it at least does not seem to go the full way.

Here are the flatbuffer schemas I created while looking into this: https://anonfiles.com/y347n7H8yf/trinity_7z Since the trpfd/trpak hashes are only in the code, I'd highly recommend just extracting the data like so: <sanitized trpfs key>/<trpak hash>. At least until someone runs over all the strings in the code and creates a list of correct file paths.

The flatbuffer schemas match with the ones we use here as well, so this is good.

For NSO uncompression, I would recommend hactool.

FullLifeGames commented 1 year ago

For NSO uncompression, I would recommend hactool.

Oh, there we go. (For others, using HxD Hex Editor and searching for e.g. "personal_array.bin", you are going to find it.

Gotcha, now I also get the challenge of trying to extract them.

FullLifeGames commented 1 year ago

I'd highly recommend just extracting the data like so: <sanitized trpfs key>/<trpak hash>. At least until someone runs over all the strings in the code and creates a list of correct file paths.

With https://github.com/psthrn42/SCVI_Extract/pull/5 the data will now be stored in a similar format to the one you mentioned above. Thanks for all the input!

psthrn42 commented 1 year ago

@bWFpbA @FullLifeGames This was actually already pretty much how trpak_extract worked. Guess I didn't look hard enough at your full_extract PR before merging lol. I'll have a look at your new PR.

psthrn42 commented 1 year ago

On a different note, @bWFpbA did you know that this game includes a bunch of BFBS files? They basically contain all the information about the original fbs the devs used including object and field names, etc, and can be used to reconstruct it almost exactly. Think we got really lucky with that one. Unfortunately I couldn't find any for models or anims or anything (though p sure the anims are very similar to previous games, they sorta open in switch toolbox), but there is a massive one for the pokemon ai for example. Grep for BFBS and you should see them all.

And thanks for the better tpfd schema

psthrn42 commented 1 year ago

Also, one more question. I haven't had a look at the exefs yet, but are all the paths really hardcoded in? In previous pokemon games we needed to pretty much guess most of the hashes (i think).

bWFpbA commented 1 year ago

On a different note, @bWFpbA did you know that this game includes a bunch of BFBS files? They basically contain all the information about the original fbs the devs used including object and field names, etc, and can be used to reconstruct it almost exactly. Think we got really lucky with that one. Unfortunately I couldn't find any for models or anims or anything (though p sure the anims are very similar to previous games, they sorta open in switch toolbox), but there is a massive one for the pokemon ai for example. Grep for BFBS and you should see them all.

I did see all the BFBS files, though I don't think any were of interest to me - I was mainly digging through the files to get to the personal table, and I don't think that had a BFBS file.

Also, one more question. I haven't had a look at the exefs yet, but are all the paths really hardcoded in? In previous pokemon games we needed to pretty much guess most of the hashes (i think).

I'm not sure about the previous games, but I believe most of them are in the code? I probably wouldn't have found the personal table otherwise, since they chose to store it in the avalon/ directory rather than the usual pml/ directory.

bWFpbA commented 1 year ago

Do BFBS files have a documented structure? I've never looked at them before. I see that they are also a flatbuffer (pain). Currently trying to figure out the trainer BFBS.

EDIT: Heh, I don't think I've ever had much fun reverse engineering flatbuffers, but the BFBS one is quite fun - a table within a table within a struct within a table within either a vector of tables or just a table is quite fun to look at in a hex editor. EDIT 2: Ugh, and now I learn that the main flatbuffers repository has a reflection.fbs file that is the schema for the BFBS format. Oh well, it was fun while it lasted.

jxx96 commented 1 year ago

is there a way to extract the Pokemon models?