stephen-bunn / bethesda-structs

A wrapper for Bethesda's popular plugin/archive file formats
MIT License
16 stars 5 forks source link

fails to extract Cathedral Weathers - Textures.bsa #17

Open leontristain opened 4 years ago

leontristain commented 4 years ago

Hi,

Just wanted to report a problem. I tried to use bethesda-structs to extract Cathedral Weathers - Textures.bsa from the Cathedral Weathers and Seasons mod, and ran into an lz4-related crash.

My python:

(venv) PS D:\git\skyrim-builder> python
Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>

My virtual environment:

(venv) PS D:\git\skyrim-builder> pip list
Package                Version    Location
---------------------- ---------- -----------------------------
atomicwrites           1.3.0
attrs                  19.1.0
bethesda-structs       0.1.4
cached-property        1.5.1
certifi                2020.4.5.1
chardet                3.0.4
Click                  7.0
colorama               0.4.1
construct              2.9.45     
flake8                 3.6.0
gitdb2                 2.0.5
GitPython              2.1.11
idna                   2.9
Jinja2                 2.10
lxml                   4.3.0
lz4                    3.0.2
MarkupSafe             1.1.1
mccabe                 0.6.1
more-itertools         7.0.0
multidict              4.5.2
pip                    18.1
pluggy                 0.6.0
py                     1.8.0
pycodestyle            2.4.0
pyflakes               2.0.0
pyfomod                1.2.1      d:\git\pyfomod\src
pynxm                  0.1.0
pytest                 3.6.1
pywin32                227
PyYAML                 5.1.1
requests               2.23.0
setuptools             40.6.2
six                    1.12.0
skyrim-builder         0.1        d:\git\skyrim-builder
skyrim-package-manager 0.1        d:\git\skyrim-package-manager
skyrimbuilder          0.1
smmap2                 2.0.5
spm                    0.1
tabulate               0.8.6
tqdm                   4.46.0
urllib3                1.25.9
websocket-client       0.57.0

My sample repro script:

import click
from pathlib import Path
from bethesda_structs.archive import BSAArchive

@click.command()
@click.argument('bsa')
@click.argument('dest')
def main(bsa, dest):
    bsa = Path(bsa)
    dest = Path(dest)
    assert bsa.exists()
    assert dest.exists()

    print(bsa)
    print(dest)

    archive = BSAArchive.parse_file(str(bsa))
    print(archive.container.header)
    print(archive.container.directory_records)
    archive.extract(str(dest))

if __name__ == '__main__':
    main()

The output:

(venv) PS D:\git\skyrim-builder> python .\test.py 'C:\users\{user redacted}\desktop\Cathedral Weathers - Textures.bsa' C:\users\{user redacted}\desktop\dest       
C:\users\{user redacted}\desktop\Cathedral Weathers - Textures.bsa
C:\users\{user redacted}\desktop\dest
Container: 
    magic = b'BSA\x00' (total 4)
    version = 105
    directory_offset = 36
    archive_flags = Container:
        directories_named = True
        files_named = True
        files_compressed = True
        files_prefixed = True
    directory_count = 2
    file_count = 24
    directory_names_length = 30
    file_names_length = 368
    file_flags = Container:
        dds = True
ListContainer:
    Container:
        hash = 7911102381315617907
        file_count = 8
        name_offset = 452
    Container:
        hash = 13447612162718329721
        file_count = 16
        name_offset = 598
Traceback (most recent call last):
  File ".\test.py", line 25, in <module>
    main()
  File "D:\git\skyrim-builder\venv\lib\site-packages\click\core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "D:\git\skyrim-builder\venv\lib\site-packages\click\core.py", line 717, in main
    rv = self.invoke(ctx)
  File "D:\git\skyrim-builder\venv\lib\site-packages\click\core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "D:\git\skyrim-builder\venv\lib\site-packages\click\core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File ".\test.py", line 21, in main
    archive.extract(str(dest))
  File "D:\git\skyrim-builder\venv\lib\site-packages\bethesda_structs\archive\_common.py", line 170, in extract
    archive_files = list(self.iter_files())
  File "D:\git\skyrim-builder\venv\lib\site-packages\bethesda_structs\archive\bsa.py", line 256, in iter_files
    file_record.offset + (file_record.size & self.SIZE_MASK)
  File "D:\git\skyrim-builder\venv\lib\site-packages\construct\core.py", line 304, in parse
    return self.parse_stream(io.BytesIO(data), **contextkw)
  File "D:\git\skyrim-builder\venv\lib\site-packages\construct\core.py", line 316, in parse_stream
    return self._parsereport(stream, context, "(parsing)")
  File "D:\git\skyrim-builder\venv\lib\site-packages\construct\core.py", line 328, in _parsereport
    obj = self._parse(stream, context, path)
  File "D:\git\skyrim-builder\venv\lib\site-packages\construct\core.py", line 1979, in _parse
    subobj = sc._parsereport(stream, context, path)
  File "D:\git\skyrim-builder\venv\lib\site-packages\construct\core.py", line 328, in _parsereport
    obj = self._parse(stream, context, path)
  File "D:\git\skyrim-builder\venv\lib\site-packages\construct\core.py", line 2468, in _parse
    return self.subcon._parsereport(stream, context, path)
  File "D:\git\skyrim-builder\venv\lib\site-packages\construct\core.py", line 328, in _parsereport
    obj = self._parse(stream, context, path)
  File "D:\git\skyrim-builder\venv\lib\site-packages\construct\core.py", line 3593, in _parse
    return sc._parsereport(stream, context, path)
  File "D:\git\skyrim-builder\venv\lib\site-packages\construct\core.py", line 328, in _parsereport
    obj = self._parse(stream, context, path)
  File "D:\git\skyrim-builder\venv\lib\site-packages\construct\core.py", line 715, in _parse
    return self._decode(obj, context, path)
  File "D:\git\skyrim-builder\venv\lib\site-packages\bethesda_structs\archive\bsa.py", line 48, in _decode
    return lz4.frame.decompress(obj)
RuntimeError: LZ4F_getFrameInfo failed with code: ERROR_frameType_unknown

The lz4 stuff are beyond me to understand. Looking at other BSA extractors, the Bethesda Archive Extractor page has in its version history a line that says "0.07 - Corrected the size sent to LZ4 for decompression, which affected only a very small number of files." Maybe this is related?

Confirmed that the same bsa file can be successfully extracted through the Bethesda Archive Extractor (above mentioned GUI tool) and the BSA Browser, which comes with a cli utility that I can use for the time being. Still though, it would be nice to be able to code this in native python at some point.

Anyway, thanks for your library!

stephen-bunn commented 3 years ago

Sorry for the really late reply. I'm going to be marking this repo as archived since I no longer will be able to maintain it and it honestly could use a bunch of rework to better support plugin loading.

My only suggestion would be to maybe try and swap out lz4.frame.decompress with lz4.block.decompress. The LZ4 compression handling for BSA archives was never fully tested with a ton of archives and I'm sure that there are a few that this will fail decompression. Again sorry for the trouble.