python / cpython

The Python programming language
https://www.python.org
Other
63.63k stars 30.48k forks source link

TarFile.gettarinfo modifies self.inodes #56108

Open 3daf8f3e-7eb3-4fea-924b-d2acb41967b8 opened 13 years ago

3daf8f3e-7eb3-4fea-924b-d2acb41967b8 commented 13 years ago
BPO 11899
Nosy @gustaebel

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = 'https://github.com/gustaebel' closed_at = None created_at = labels = ['type-bug', 'library'] title = 'TarFile.gettarinfo modifies self.inodes' updated_at = user = 'https://bugs.python.org/mgold-qnx' ``` bugs.python.org fields: ```python activity = actor = 'mgold' assignee = 'lars.gustaebel' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'mgold-qnx' dependencies = [] files = [] hgrepos = [] issue_num = 11899 keywords = [] message_count = 4.0 messages = ['134220', '134224', '134264', '134299'] nosy_count = 3.0 nosy_names = ['lars.gustaebel', 'mgold-qnx', 'mgold'] pr_nums = [] priority = 'low' resolution = None stage = None status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue11899' versions = ['Python 3.3'] ```

3daf8f3e-7eb3-4fea-924b-d2acb41967b8 commented 13 years ago

When I call tar.gettarinfo (where tar is a TarFile instance), the inode information is inserted into tar.inodes. If I later call tar.gettarinfo on a linked file, the returned TarInfo will have type LNKTYPE.

I think it's incorrect to store this information in gettarinfo. It should be done in addfile.

A comment in gettarinfo states "Is it a hardlink to an already archived file?". But tar.inodes is modified in gettarinfo, and there's no reason to expect that the file will actually be archived, or will be archived with the same properties. Bad links could result if the returned tarinfo object were modified before calling addfile.

I suggest changing the code as follows:

3daf8f3e-7eb3-4fea-924b-d2acb41967b8 commented 13 years ago

Actually, TarFile should also have a separate method to take a TarInfo instance and modify its type to LNKTYPE if applicable. gettarinfo can call that.

This way the user can use a TarInfo object created before any files are added, and can easily get this linking behaviour if desired.

(Note: In my initial message, I had LNKNAME where I meant LNKTYPE.)

460c80ff-bdb7-4416-a811-ee58995bd9ed commented 13 years ago

Good point. Do you happen to have a working implementation already?

a271c05b-2a9e-4994-8406-4e183888d8ab commented 13 years ago

No, I don't have a working implementation. (I basically reimplemented TarFile.inodes to work around this; I was using TarFile.dereference, so I already had to do the hard-linking manually.)

cuihaoleo commented 2 months ago

I have a use case where I want to implement a tarinfo filter that may change tarinfo.name, and realized this issue.

Example (f1 and f2 link to the same inode):

with tarfile.open("test.tar", mode='w') as tar:
    info1 = tar.gettarinfo('f1')
    info1.name = 'foo'

    with open('f1', 'rb') as f:
        tar.addfile(info1, f)

    tar.add('f2')

Ending up with a tar like this:

$ tar tvf test.tar
-rw-r--r-- cuihao/cuihao     7 2024-09-21 07:47 foo
hrw-r--r-- cuihao/cuihao     0 2024-09-21 07:47 f2 link to f1

Probably I'll make my filter maintain a mapping between old and new names, so it can fix tarinfo.linkname in it.