python / cpython

The Python programming language
https://www.python.org
Other
63.54k stars 30.44k forks source link

tarfile doesn't handle sysfs well #54969

Open 11636bb1-03a9-4a35-84a7-07106d64f0f9 opened 13 years ago

11636bb1-03a9-4a35-84a7-07106d64f0f9 commented 13 years ago
BPO 10760
Nosy @gustaebel

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = 'https://github.com/gustaebel' closed_at = None created_at = labels = ['type-bug', 'library'] title = "tarfile doesn't handle sysfs well" updated_at = user = 'https://bugs.python.org/YoniTsafir' ``` bugs.python.org fields: ```python activity = actor = 'christian.heimes' assignee = 'lars.gustaebel' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'Yoni.Tsafir' dependencies = [] files = [] hgrepos = [] issue_num = 10760 keywords = [] message_count = 2.0 messages = ['124514', '171090'] nosy_count = 3.0 nosy_names = ['lars.gustaebel', 'Yoni.Tsafir', 'guyrozendorn'] pr_nums = [] priority = 'normal' resolution = None stage = 'test needed' status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue10760' versions = ['Python 2.7', 'Python 3.3', 'Python 3.4'] ```

11636bb1-03a9-4a35-84a7-07106d64f0f9 commented 13 years ago

When I try to add a special file from sys, e.g.: /sys/class/scsi_host/host0/cmd_per_lun (which is reported of size 4096 but actually reading it will return only several bytes of a result), I get the following exception:

Traceback (most recent call last):
  File "/opt/xpyv/lib/python26.zip/tarfile.py", line 1975, in add
    self.addfile(tarinfo, f)
  File "/opt/xpyv/lib/python26.zip/tarfile.py", line 2004, in addfile
    copyfileobj(fileobj, self.fileobj, tarinfo.size)
  File "/opt/xpyv/lib/python26.zip/tarfile.py", line 287, in copyfileobj
    raise IOError("end of file reached")
IOError: end of file reached

Notice what happens if I try to add the file with regular tar: root@buzaglo # tar cvzf /tmp/blat.tgz /sys/class/scsi_host/host0/cmd_per_lun tar: Removing leading `/' from member names /sys/class/scsi_host/host0/cmd_per_lun tar: /sys/class/scsi_host/host0/cmd_per_lun: File shrank by 4094 bytes; padding with zeros tar: Error exit delayed from previous errors

So it handles the issue by padding the rest of the file size with zeros.

I think this should be the behavior as well, instead of throwing an IOError.

af24a157-ac4f-4826-8ad5-0cb8bd4b1f95 commented 12 years ago

Here's a test case that re-creates this issue. I chose to use mocks instead of sample files from sysfs so it would be simpler to run, it can be easily changed to use a file from sysfs.

The following code runs on Python2.7, requires the mock library

from unittest import TestCase
from tempfile import mkstemp
from mock import patch, Mock
from os import close, remove, write, stat
from posix import stat_result
from tarfile import TarFile

def fake_st_size_side_effect(*args, **kwargs):
    src, = args
    stats = stat(src)
    return stat_result((stats.st_mode, stats.st_ino, stats.st_dev, stats.st_nlink,
                       stats.st_uid, stats.st_gid, stats.st_size + 10,
                       stats.st_atime, stats.st_mtime, stats.st_ctime))

class Issue10760TestCase(TestCase):
    def setUp(self):
        fd, self.src = mkstemp()
        write(fd, '\x00' * 4)
        close(fd)
        fd, self.dst = mkstemp()
        close(fd)

    def test(self):
        with patch("os.lstat") as lstat:
            lstat.side_effect = fake_st_size_side_effect
            tar_file = TarFile.open(self.dst, 'w:gz')
            tar_file.add(self.src)
iam-TJ commented 1 year ago

This issue, that particularly affects reading from sysfs, is still present. In all cases when it fails the read only gets the first 4096 bytes of files larger than that.

The cause is the calls within tar to read() functions assume a 'short read' ( the returned size_t less than the bufsize) means the file shrank, but in fact sysfs internally uses a buffer of PAGE_SIZE which is usually 4096 and the calling process should not make assumptions.

I've removed my original report since this isn't the place to go into that much detail; instead here's a link to the upstream tar bug report with an attached patch that fixes the issue,

https://savannah.gnu.org/bugs/index.php?64426

kurtqq commented 8 months ago

sadly this bug is still present :(