twoolie / NBT

Python Parser/Writer for the NBT file format, and it's container the RegionFile.
MIT License
361 stars 74 forks source link

[Bug Report]: UnicodeDecodeError on valid data #187

Open lexi-the-cute opened 8 months ago

lexi-the-cute commented 8 months ago

The dumped NBT data is: corrupted.nbt.zip

This data was dumped from the region file world/region/r.3.3.mca in The Uncensored Library from https://uncensoredlibrary.com

The data is read just fine from irath's NBT Editor as per the below screenshot Screenshot_20240108_062140

The chunk coordinates are (24, 1) if you want to check out the original world file (if using my scanner's offset system, else they are (120, -4, 97) according to the NBT data inside the chunk data)

The log is:

  File "/home/alexis/Documents/Projects/anvil-parser/anvil/region.py", line 98, in chunk_data
    nbt_data = nbt.NBTFile(buffer=BytesIO(decompressed_data))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alexis/Desktop/world-scanner/.venv/lib/python3.11/site-packages/nbt/nbt.py", line 641, in __init__
    self.parse_file()
  File "/home/alexis/Desktop/world-scanner/.venv/lib/python3.11/site-packages/nbt/nbt.py", line 670, in parse_file
    self._parse_buffer(self.file)
  File "/home/alexis/Desktop/world-scanner/.venv/lib/python3.11/site-packages/nbt/nbt.py", line 506, in _parse_buffer
    tag._parse_buffer(buffer)
  File "/home/alexis/Desktop/world-scanner/.venv/lib/python3.11/site-packages/nbt/nbt.py", line 414, in _parse_buffer
    self.tags.append(TAGLIST[self.tagID](buffer=buffer))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alexis/Desktop/world-scanner/.venv/lib/python3.11/site-packages/nbt/nbt.py", line 489, in __init__
    self._parse_buffer(buffer)
  File "/home/alexis/Desktop/world-scanner/.venv/lib/python3.11/site-packages/nbt/nbt.py", line 506, in _parse_buffer
    tag._parse_buffer(buffer)
  File "/home/alexis/Desktop/world-scanner/.venv/lib/python3.11/site-packages/nbt/nbt.py", line 506, in _parse_buffer
    tag._parse_buffer(buffer)
  File "/home/alexis/Desktop/world-scanner/.venv/lib/python3.11/site-packages/nbt/nbt.py", line 506, in _parse_buffer
    tag._parse_buffer(buffer)
  File "/home/alexis/Desktop/world-scanner/.venv/lib/python3.11/site-packages/nbt/nbt.py", line 414, in _parse_buffer
    self.tags.append(TAGLIST[self.tagID](buffer=buffer))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alexis/Desktop/world-scanner/.venv/lib/python3.11/site-packages/nbt/nbt.py", line 355, in __init__
    self._parse_buffer(buffer)
  File "/home/alexis/Desktop/world-scanner/.venv/lib/python3.11/site-packages/nbt/nbt.py", line 363, in _parse_buffer
    self.value = read.decode("utf-8")
                 ^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 146: invalid continuation byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/alexis/Desktop/world-scanner/main.py", line 163, in <module>
    for block_x, block_y, block_z, block, block_entity in get_block_entities(region_file_path="world/region/r.3.3.mca", hidden_blocks=args.hidden_blocks):
  File "/home/alexis/Desktop/world-scanner/main.py", line 109, in get_block_entities
    chunk = anvil.Chunk.from_region(region, chunk_x, chunk_z)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alexis/Documents/Projects/anvil-parser/anvil/chunk.py", line 531, in from_region
    nbt_data = region.chunk_data(chunk_x, chunk_z)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alexis/Documents/Projects/anvil-parser/anvil/region.py", line 103, in chunk_data
    raise CorruptedData('Failed to read decompressed NBT data with UnicodeDecodeError')
anvil.errors.CorruptedData: Failed to read decompressed NBT data with UnicodeDecodeError
OpenBagTwo commented 6 months ago

Hi @lexi-the-cute, I've encountered this problem too--it'll happen for any regions where you have named an entity using non-ASCII symbols (like emoji).

See: https://github.com/twoolie/NBT/issues/144