python / cpython

The Python programming language
https://www.python.org
Other
62.16k stars 29.88k forks source link

zipfile zipfile.BadZipFile: Bad CRC-32 for file '11_02_2019.pdf' #80754

Open 53404e7b-dd83-450a-a824-573abaf794af opened 5 years ago

53404e7b-dd83-450a-a824-573abaf794af commented 5 years ago
BPO 36573
Nosy @Yhg1s, @serhiy-storchaka

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['type-bug', 'library'] title = "zipfile zipfile.BadZipFile: Bad CRC-32 for file '11_02_2019.pdf'" updated_at = user = 'https://bugs.python.org/JozefCernak' ``` bugs.python.org fields: ```python activity = actor = 'serhiy.storchaka' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'Jozef Cernak' dependencies = [] files = [] hgrepos = [] issue_num = 36573 keywords = [] message_count = 7.0 messages = ['339722', '339727', '339728', '339729', '339730', '339734', '339736'] nosy_count = 4.0 nosy_names = ['twouters', 'alanmcintyre', 'serhiy.storchaka', 'Jozef Cernak'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue36573' versions = ['Python 3.5'] ```

53404e7b-dd83-450a-a824-573abaf794af commented 5 years ago

Hi, in the short program, that works well for password of 4 character, when I change password length I got this error (parameter MAXD)

Traceback (most recent call last):
  File "p33.py", line 54, in <module>
    zf.extractall( pwd=password.encode('cp850','replace'))
  File "/usr/lib/python3.5/zipfile.py", line 1347, in extractall
    self.extract(zipinfo, path, pwd)
  File "/usr/lib/python3.5/zipfile.py", line 1335, in extract
    return self._extract_member(member, path, pwd)
  File "/usr/lib/python3.5/zipfile.py", line 1399, in _extract_member
    shutil.copyfileobj(source, target)
  File "/usr/lib/python3.5/shutil.py", line 73, in copyfileobj
    buf = fsrc.read(length)
  File "/usr/lib/python3.5/zipfile.py", line 844, in read
    data = self._read1(n)
  File "/usr/lib/python3.5/zipfile.py", line 934, in _read1
    self._update_crc(data)
  File "/usr/lib/python3.5/zipfile.py", line 862, in _update_crc
    raise BadZipFile("Bad CRC-32 for file %r" % self.name)
zipfile.BadZipFile: Bad CRC-32 for file '11_02_2019.pdf'

program: import string, zipfile, zlib

from zipfile import ZipFile

zf= ZipFile('11_02_2019.pdf.zip')

MAXD=6

upper_case=string.ascii_uppercase
uc=list(upper_case)

n=len(uc)
print (n)

pos=[]
for k in range(0,MAXD):
    pos.append(0)

print (pos) 

for let in range(0,n):
    print (let, uc[let]) 

let=0
koniec=0;
k3=0
p=0

while koniec != MAXD :

    k=0

    password=''
    for k2 in range(0,MAXD):

        password=password+uc[pos[k2]]

    print   (password)

    try:

        with zipfile.ZipFile('11_02_2019.pdf.zip') as zf:
            zf.extractall( pwd=password.encode('cp850','replace'))
            print ("Password found:" + password)
            exit(0)

    except RuntimeError:
        pass

    except zlib.error:
        pass

    #print "ppppppppppppppppppppppppp",p,  paswd
pos[0]=pos[0]+1

for k2  in range(0,MAXD-1):
    if pos[k2]>=n:
        pos[k2]=0
        pos[k2+1]=pos[k2+1]+1

koniec=0

for k2 in range(0,MAXD):
    if pos[k2] >= n-1:
        koniec=koniec+1

Similar behaviuor I observed in older version of python (2.7) and correspondig library.

The zip archive is procted by simple password 'ABCD', the file is not big less tha 1MB.

Best regards Jozef

serhiy-storchaka commented 5 years ago

Do you get an error when try to extract the file using the valid password?

53404e7b-dd83-450a-a824-573abaf794af commented 5 years ago

Dear Serhiy, in the case of correct password, the program works well:

OACD PACD QACD RACD SACD TACD UACD VACD WACD XACD YACD ZACD ABCD Password found:ABCD

for five characters:
RRJBA
Traceback (most recent call last):
  File "p33.py", line 54, in <module>
    zf.extractall( pwd=password.encode('cp850','replace'))
  File "/usr/lib/python3.5/zipfile.py", line 1347, in extractall
    self.extract(zipinfo, path, pwd)
  File "/usr/lib/python3.5/zipfile.py", line 1335, in extract
    return self._extract_member(member, path, pwd)
  File "/usr/lib/python3.5/zipfile.py", line 1399, in _extract_member
    shutil.copyfileobj(source, target)
  File "/usr/lib/python3.5/shutil.py", line 73, in copyfileobj
    buf = fsrc.read(length)
  File "/usr/lib/python3.5/zipfile.py", line 844, in read
    data = self._read1(n)
  File "/usr/lib/python3.5/zipfile.py", line 934, in _read1
    self._update_crc(data)
  File "/usr/lib/python3.5/zipfile.py", line 862, in _update_crc
    raise BadZipFile("Bad CRC-32 for file %r" % self.name)
zipfile.BadZipFile: Bad CRC-32 for file '11_02_2019.pdf'

specially for RRJBA
AAAAA
Traceback (most recent call last):
  File "p33.py", line 54, in <module>
    zf.extractall( pwd=password.encode('cp850','replace'))
  File "/usr/lib/python3.5/zipfile.py", line 1347, in extractall
    self.extract(zipinfo, path, pwd)
  File "/usr/lib/python3.5/zipfile.py", line 1335, in extract
    return self._extract_member(member, path, pwd)
  File "/usr/lib/python3.5/zipfile.py", line 1399, in _extract_member
    shutil.copyfileobj(source, target)
  File "/usr/lib/python3.5/shutil.py", line 73, in copyfileobj
    buf = fsrc.read(length)
  File "/usr/lib/python3.5/zipfile.py", line 844, in read
    data = self._read1(n)
  File "/usr/lib/python3.5/zipfile.py", line 934, in _read1
    self._update_crc(data)
  File "/usr/lib/python3.5/zipfile.py", line 862, in _update_crc
    raise BadZipFile("Bad CRC-32 for file %r" % self.name)
zipfile.BadZipFile: Bad CRC-32 for file '11_02_2019.pdf'
for six characters:
KMQAAA
LMQAAA
MMQAAA
NMQAAA
OMQAAA
PMQAAA
QMQAAA
RMQAAA
SMQAAA
TMQAAA
UMQAAA
VMQAAA
WMQAAA
XMQAAA
YMQAAA
ZMQAAA
ANQAAA
Traceback (most recent call last):
  File "p33.py", line 54, in <module>
    zf.extractall( pwd=password.encode('cp850','replace'))
  File "/usr/lib/python3.5/zipfile.py", line 1347, in extractall
    self.extract(zipinfo, path, pwd)
  File "/usr/lib/python3.5/zipfile.py", line 1335, in extract
    return self._extract_member(member, path, pwd)
  File "/usr/lib/python3.5/zipfile.py", line 1399, in _extract_member
    shutil.copyfileobj(source, target)
  File "/usr/lib/python3.5/shutil.py", line 73, in copyfileobj
    buf = fsrc.read(length)
  File "/usr/lib/python3.5/zipfile.py", line 844, in read
    data = self._read1(n)
  File "/usr/lib/python3.5/zipfile.py", line 934, in _read1
    self._update_crc(data)
  File "/usr/lib/python3.5/zipfile.py", line 862, in _update_crc
    raise BadZipFile("Bad CRC-32 for file %r" % self.name)
zipfile.BadZipFile: Bad CRC-32 for file '11_02_2019.pdf'

It seems that after certain attempts command produces different behaviour as in the previous attemts to call zf.extractall( pwd=password.encode('cp850','replace'))

Best regards

Jozef

On Tue, Apr 9, 2019 at 12:47 PM Serhiy Storchaka \report@bugs.python.org\ wrote:

Serhiy Storchaka \storchaka+cpython@gmail.com\ added the comment:

Do you get an error when try to extract the file using the valid password?

----------


Python tracker \report@bugs.python.org\ \https://bugs.python.org/issue36573\


serhiy-storchaka commented 5 years ago

If you try to extract the file using an invalid password, it is an expected behavior.

53404e7b-dd83-450a-a824-573abaf794af commented 5 years ago

Ok, however behaviur is detected after several attempts i.e. behaviour is not regular but depends on the previous history, how or how many times functions was called. I think such behaviur should indicate that function store previous data, i.e. history. Best regards Jozef

On Tue, Apr 9, 2019 at 1:05 PM Serhiy Storchaka \report@bugs.python.org\ wrote:

Serhiy Storchaka \storchaka+cpython@gmail.com\ added the comment:

If you try to extract the file using an invalid password, it is an expected behavior.

----------


Python tracker \report@bugs.python.org\ \https://bugs.python.org/issue36573\


53404e7b-dd83-450a-a824-573abaf794af commented 5 years ago

Hi, I changed zipped file password to the new string "RRJBB" that is a combination after RRJBA to see what will happen. At the input combination KWFEA I got the message:

KWFEA
Traceback (most recent call last):
  File "p33.py", line 54, in <module>
    zf.extractall( pwd=password.encode('cp850','replace'))
  File "/usr/lib/python3.5/zipfile.py", line 1347, in extractall
    self.extract(zipinfo, path, pwd)
  File "/usr/lib/python3.5/zipfile.py", line 1335, in extract
    return self._extract_member(member, path, pwd)
  File "/usr/lib/python3.5/zipfile.py", line 1399, in _extract_member
    shutil.copyfileobj(source, target)
  File "/usr/lib/python3.5/shutil.py", line 73, in copyfileobj
    buf = fsrc.read(length)
  File "/usr/lib/python3.5/zipfile.py", line 844, in read
    data = self._read1(n)
  File "/usr/lib/python3.5/zipfile.py", line 934, in _read1
    self._update_crc(data)
  File "/usr/lib/python3.5/zipfile.py", line 862, in _update_crc
    raise BadZipFile("Bad CRC-32 for file %r" % self.name)
zipfile.BadZipFile: Bad CRC-32 for file '11_02_2019B.pdf'

Jozef

On Tue, Apr 9, 2019 at 1:05 PM Serhiy Storchaka \report@bugs.python.org\ wrote:

Serhiy Storchaka \storchaka+cpython@gmail.com\ added the comment:

If you try to extract the file using an invalid password, it is an expected behavior.

----------


Python tracker \report@bugs.python.org\ \https://bugs.python.org/issue36573\


serhiy-storchaka commented 5 years ago

This is how the weak encryption in ZIP files work. In 255 cases from 256 the wrong password can be detected earlier (this make the encryption just weaker). But it 1 case of 256 this check is passed and you will get either an error of mismatched CRC, or the compressor specific error if use compression. There is even very small chance (1 of 2**32 or like) that you will silently get incorrectly decrypted data.

It is better to not use the weak encryption in ZIP files. If you need to encrypt data safely, use third-party encryption libraries.