panzi / u4pak

unpack, pack, list, check and mount Unreal Engine 4 .pak archives
510 stars 143 forks source link

Codec can't decode byte #23

Open Veganlol opened 5 years ago

Veganlol commented 5 years ago

Hi there. I'm having this issue:

'utf8' codec can't decode byte 0xd1 in position 69: invalid continuation byte

I wrote this line:

C:\Windows\system32>"C:\Users\Eddd\Desktop\u4pak.py" unpack "C:\Users\Eddd\Desktop\pakchunk14-WindowsNoEditor.pak"

Already changed the decode method from UTF-8 to Latin1 as I've read in another issue thread, but didn't help. I'm getting this then:

File "C:\Users\Eddd\Desktop\u4pak.py", line 1720, in <module> main(sys.argv[1:]) File "C:\Users\Eddd\Desktop\u4pak.py", line 1657, in main pak = read_index(stream,args.check_integrity) File "C:\Users\Eddd\Desktop\u4pak.py", line 799, in read_index record = read_record(stream, filename) File "C:\Users\Eddd\Desktop\u4pak.py", line 573, in read_record_v4 st_unpack('<QQQI20s', stream.read(48)) struct.error: unpack requires a string argument of length 48

panzi commented 5 years ago

0xD1 in latin1 is the character "Ñ". Can that be correct for the given game? Containing a resource (file) with that letter in it's name?

The second error means that the file is somehow cut off. At least under the assumption that it actually is a supported archive format. Given that the file is called pakchunk14-WindowsNoEditor.pak I would guess it's just a part of the whole archive? That is all I can tell from the provided information.

Veganlol commented 5 years ago

Thanks for the reply, had no time to answer yet!

I don't think any file contains that letter. The game I'm trying to unpack is A Way Out, I should have mention it. So I have a folder called Paks and in it there's pakchunk0-WindowsNoEditor.pak to pakchunk16-WindowsNoEditor.pak. 17 files. I tried to unpack all of them, with noluck. I'm always getting different bytes that cannot be decoded at different positions. Guess, the game's archives is just not supported? Or is there a workaround I could try?

panzi commented 5 years ago

Yeah, given all that it looks like this is a different file format.

YellowApple commented 5 years ago

Howdy!

I'm getting a similar error when attempting to list the contents of Ace Combat 7's .pak files ('utf8' codec can't decode byte 0xff in position 1: invalid start byte). After snipping out the catches around main() to get the stack trace, it looks like (at least in my case) it's happening at the very first attempt to call read_path(stream) (when setting mount_point in read_index, right after the magic/version checks).

tl;dr version of the below: turns out AES-encrypting the index is a thing. I've hacked up a way to actually decrypt said encrypted index (and would be happy to clean that up and turn it into a PR if you're fine with adding a dependency on PyCrypto). The below's more of my investigative process here.


With a bit of "redneck debugging" (i.e. some extra prints), I was able to confirm that the footer (and the index offset/size it specifies) at least appears to look plausible:

Magic:         1517228769
Version:       4
Index Offset:  14243509476
      Size:    2751904
      End:     14246261380
Footer Offset: 14246261381

And yet, it looks like the very first path it tries to read (for mount_point) has a size of 3126569118, and the portion of the stream it tries to read appears to be raw binary data (scrolled through it for awhile and there sure ain't anything that looks like a path). Same for the rest of the chunks.

By the looks of it, apparently AES-encryption of the index itself (as opposed to the contents) is a thing. Don't know why (the contents are rather clearly unencrypted), and I don't really care as long as I can get the key and figure out the decryption part. Turns out the keys are typically available online (as is the case for AC7), and with some online research and guesswork figured out that something akin to below is enough to get it decrypted:

# Deep inside read_index(), after doing all the magic and version validations and
# getting the index offset/size...
from Crypto.Cipher import AES
from io import BytesIO
idx_key = b"32-byte raw binary key"  # get this from a command-line arg or prompt or something
idx_encrypted = stream.read(index_size)
idx_cipher = AES.new(idx_key, AES.MODE_ECB)
idx_stream = BytesIO(idx_cipher.decrypt(idx_encrypted))

mount_point = read_path(idx_stream)  # This works!  :)
entry_count = st_unpack('<I',idx_stream.read(4))[0]  # This also works!

pak = Pak(version, index_offset, index_size, footer_offset, index_sha1, mount_point)

# The below loop works!  I had to take out the attempt to read that last integer (`unknown`) in `read_record_v4()`, though, since apparently that actually is the size of the next filename.
for i in xrange(entry_count):
    filename = read_path(idx_stream)  # blows up here :(
    record   = read_record(idx_stream, filename)
    pak.records.append(record)

At this point, my local copy of u4pak.py is able to decode AC7's .pak files. info and list both work, and unpack seems to work (seems like some files might still be encrypted or otherwise encoded weirdly, but as far as I can tell they did extract correctly). I haven't tried implementing (re)packing yet, but in principle it should work the same way but in reverse.