miurahr / py7zr

7zip in python3 with ZStandard, PPMd, LZMA2, LZMA1, Delta, BCJ, BZip2, and Deflate compressions, and AES encryption.
https://pypi.org/project/py7zr/
GNU Lesser General Public License v2.1
463 stars 74 forks source link

Extracting a corrupt 7z file and try to delete it will cause the PermissionError in windows #597

Open ok-oldking opened 4 months ago

ok-oldking commented 4 months ago
import os

import py7zr

try:
    with py7zr.SevenZipFile('corrupt.zip', mode='r') as z:
        z.extractall('test')
except Exception as e:
    os.remove('corrupt.zip')

When I’m extracting a particular corrupted 7z file, I get the PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: ‘corrupt.7z’. This error only occurs with this corrupted file. When I test with other random invalid 7z files, the error does not occur. Because the file is 80MB, I can't upload it to github, here is the google drive link

Traceback (most recent call last):
  File "D:\projects\ok-wuthering-waves\test.py", line 7, in <module>
    z.extractall('test')
  File "D:\projects\ok-wuthering-waves\venv\Lib\site-packages\py7zr\py7zr.py", line 999, in extractall
    self._extract(path=path, return_dict=False, callback=callback)
  File "D:\projects\ok-wuthering-waves\venv\Lib\site-packages\py7zr\py7zr.py", line 629, in _extract
    self.worker.extract(
  File "D:\projects\ok-wuthering-waves\venv\Lib\site-packages\py7zr\py7zr.py", line 1313, in extract
    raise exc_info[1].with_traceback(exc_info[2])
  File "D:\projects\ok-wuthering-waves\venv\Lib\site-packages\py7zr\py7zr.py", line 1338, in extract_single
    self._extract_single(fp, files, path, src_end, q, skip_notarget)
  File "D:\projects\ok-wuthering-waves\venv\Lib\site-packages\py7zr\py7zr.py", line 1407, in _extract_single
    crc32 = self.decompress(fp, f.folder, obfp, f.uncompressed, f.compressed, src_end, q)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\projects\ok-wuthering-waves\venv\Lib\site-packages\py7zr\py7zr.py", line 1466, in decompress
    tmp = decompressor.decompress(fp, min(out_remaining, max_block_size))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\projects\ok-wuthering-waves\venv\Lib\site-packages\py7zr\compressor.py", line 721, in decompress
    tmp = self._decompress(data, max_length)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\projects\ok-wuthering-waves\venv\Lib\site-packages\py7zr\compressor.py", line 677, in _decompress
    data = decompressor.decompress(data, max_length)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_lzma.LZMAError: Corrupt input data

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\projects\ok-wuthering-waves\test.py", line 9, in <module>
    os.remove('corrupt.7z')
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'corrupt.7z'
miurahr commented 1 month ago

You can read the suggestion what AI explained the situation.

Why doesn’t Windows unlock the file immediately?

Even though your program calls close() on a file, there are a few reasons why Windows (and sometimes other operating systems) might not release the lock right away:


1. Delayed or Buffered Writes

Example in Python:

with open("example.txt", "w") as f:
    f.write("Hello, World!")
    f.flush()  # Ensures everything is written immediately.
# The file will still be properly closed and unlocked.

2. Antivirus or Backup Software Scanning


3. Multiple File Handles (By Accident)

Example of an issue:

f1 = open("example.txt", "w")
f2 = open("example.txt", "r")  # Oops! Now two handles are open.
f1.close()  # Only one is closed; the other is still open.
# The file is still locked because f2 is open.

4. Pending OS Operations or File System Latency


Summary and Best Practices

By keeping these points in mind, you can avoid file-locking issues and ensure your programs behave as expected on Windows!

miurahr commented 1 month ago

This may be happened because SevenZipFile raises the exception when opening a given corrupted file.

SevenZipFile class defines

    def __exit__(self, exc_type, exc_val, exc_tb):
        self.close()

    def _fpclose(self) -> None:
        assert self._fileRefCnt > 0
        self._fileRefCnt -= 1
        if not self._fileRefCnt and not self._filePassed:
            self.fp.close()

    def close(self):
        """Flush all the data into archive and close it.
        When close py7zr start reading target and writing actual archive file.
        """
        if "w" in self.mode:
            self._write_flush()
        if "a" in self.mode:
            self._write_flush()
        if "r" in self.mode:
            if self.reporterd is not None:
                self.q.put_nowait(None)
                self.reporterd.join(1)
                if self.reporterd.is_alive():
                    raise InternalError("Progress report thread terminate error.")
                self.reporterd = None
        self._fpclose()
        self._var_release()
miurahr commented 1 month ago

SevenZipFile class supports a multithreaded extraction, so when another working process exists, it does not close immediately. It may be a reason.

miurahr commented 1 month ago
        assert self._fileRefCnt > 0
        self._fileRefCnt -= 1
        if not self._fileRefCnt and

This part may cause the unclosed status.