python / cpython

The Python programming language
https://www.python.org
Other
63.12k stars 30.22k forks source link

tempfile.TemporaryFile: name of file descriptor cannot be reused in consecutive initialization #88179

Open d4bcb7b1-d967-45e4-b312-45affcde5f15 opened 3 years ago

d4bcb7b1-d967-45e4-b312-45affcde5f15 commented 3 years ago
BPO 44013
Nosy @tjol, @zhongxiang117
Files
  • xtempfile.py: codes reproduce
  • new-xtempfile.py: bug reproduce: new testing and explanations
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['3.8', 'type-bug', 'library', '3.9'] title = 'tempfile.TemporaryFile: name of file descriptor cannot be reused in consecutive initialization' updated_at = user = 'https://github.com/zhongxiang117' ``` bugs.python.org fields: ```python activity = actor = 'zhongxiang117' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'zhongxiang117' dependencies = [] files = ['50004', '50093'] hgrepos = [] issue_num = 44013 keywords = [] message_count = 5.0 messages = ['392746', '395089', '395154', '395156', '395291'] nosy_count = 2.0 nosy_names = ['tjollans', 'zhongxiang117'] pr_nums = [] priority = 'normal' resolution = 'wont fix' stage = None status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue44013' versions = ['Python 3.8', 'Python 3.9'] ```

    d4bcb7b1-d967-45e4-b312-45affcde5f15 commented 3 years ago

    The variable of instance cannot be reused in two consecutive codes chunk combinations. Please check the difference in attached file, xtempfile.py, function: test_3 & test_4.

    However, surprisingly, the problem can be fixed in test_5 and test_6.

    Which may be helpful for debug: why does the value of "id(fd.name)" keep constant all the time inside those combinations?

    444a5333-691d-47d1-9316-49e7a80f62df commented 3 years ago

    I think I know what's going on here.

    The way you're using tempfile.TemporaryFile() is, shall we say, unusual. TemporaryFile() already gives you an open file, there's really no reason why you'd call open() again. In practice you'd usually want to write something like

    with tempfile.TemporaryFile() as f:
        # do stuff

    Back to the issue. Your code boils down to:

    fd = tempfile.TemporaryFile()
    with open(fd.fileno()) as f:
        pass
    fd = tempfile.TemporaryFile()
    fd.seek(0)

    You had fd.name rather than fd.fileno(). On *nix, these are the same for a TemporaryFile, which has no name. (On Windows, open(fd.name) fails)

    What happens? open(fd.fileno()) creates a new file object for the same low-level file descriptor. At the end of the with block, this file is closed. This is fine, but the object returned by TemporaryFile doesn't know this happened.

    You then call tempfile.TemporaryFile() again, which opens a new file. The OS uses the next available file descriptor, which happens to be the one you just closed. THEN, the old TemporaryFile object gets deleted. It doesn't know you already closed its file, so it calls close(). On the FD that has just been reused.

    This has nothing to do with reusing the same name, it's just about what order things happen in. This achieves the same effect:

    tmp1 = tempfile.TemporaryFile()
    os.close(tmp1.fileno())
    tmp2 = tempfile.TemporaryFile()
    del tmp1
    tmp2.seek(0)

    Deleting the first file object before creating the second one (like in your test_5) solves this problem. I'm not sure why your test_6 works.

    As for why id(fd.name) was the same for you? It's because fd.name is just fd.fileno() in this case, which is an integer.

    TL;DR:

    d4bcb7b1-d967-45e4-b312-45affcde5f15 commented 3 years ago

    Dear Mr. Jollans, thanks for your comments and information.

    I know my usage of tempfile.TemporaryFile() is unusual, technically I “open”ed it twice. The reason I coded them in this way, as a simple illustration, is I want to test/debug some codes of my early work which directly takes “real-file-names” as the input.

    Good to know “fd.fileno()” is more advance than “fd.name”, thank you for your information.

    For your writing, if my understanding is correct, you mean that os.close() and tempfile.TemporaryFile() are using the same low-level file descriptor sequences, which means they only keep track of their own file descriptor, and assign new file descriptor to the “just” next one.

    For example, assume file descriptor integer numbers: 1,2,3,4… as the low-level sequences.

    Firstly, call tempfile.TemporaryFile() will return 1 as the file descriptor.

    Secondly, execute “with open..” will use 2 as the new one and finally it will be closed.

    Third, call tempfile.TemporaryFile() again, problem in here is, it will sequentially take 2 (rather than 3) as the new file descriptor, because it only keeps its own orders.

    Since the file descriptor 2 is already used and closed inside second step in with operation, so the OSError will happen, causing the file descriptor 2 cannot be reused.

    Is my understanding correct?

    However, I do not agree with that. To boil down my example codes,

    fd = tempfile.TemporaryFile()
    with open(fd.fileno()) as f:
        pass             # I know fd is closed after with is done
    
    fd = tempfile.TemporaryFile() # since tempfile.TemporaryFile() is 
                                  # called again, it should return a 
                                  # valid file descriptor, and the above 
                                  # operations in “with” chunk should
                                  # not influence its result
    
    fd.seek(0)   # this should work, however, in reality, it does not

    Thanks for your example codes, but they are different with my examples. I even found a new problem with it, please check my additional testing file: “new-xtempfile.py” (which is attached).

    I think those testings can be somehow explained your writing, however, it should be a bug.

    Finally, I would like to make some revisions,

    TL;DR:

    444a5333-691d-47d1-9316-49e7a80f62df commented 3 years ago

    Hello Xiang Zhong,

    You're almost correct, but not quite.

    File descriptors/file numbers are just integers assigned by the operating system, and Python has little to no control over them. The OS keeps a numbered list of open files for each process, and Python can make system calls like "read 10 bytes from file 5" or "write these 20 bytes to file 1".

    Also good to know: 0 is stdin, 1 is stdout, 2 is stderr, 3+ are other files.

    Now, what happens:

    # Python asks the OS to open a new file. The OS puts the new file on
    # the list and gives it the lowest number that is not in use: 3.
    fd = tempfile.TemporaryFile()
    # fd is an file object for file no 3
    with open(fd.fileno()) as f:
        # f is another object for file no 3
        pass             # I know fd is closed after with is done
    # the with statement closes file no 3
    # fd does not know that file no 3 is closed. (and has no way of knowing!)
    
    # Python asks the OS to open a new file. The OS puts the new file on
    # the list and gives it the lowest number that is not in use: 3.
    _tmp = tempfile.TemporaryFile()
    # A new temporary file object is created for file no 3
    fd = _tmp
    # The old fd is finalized. It still thinks it has control of file
    # no 3, so it closes that. The new temporary file object is given
    # the name fd.

    Aside: f.fileno() is not more "advanced" than f.name. It's just that, in the case of tempfile.TemporaryFile, the file is immediately deleted and ends up having no name, so f.name falls back on returning the file number. I just preferred it here to be explicit about what's going on. Normally f.name would be a string (and it is for a TemporaryFile on Windows!); here, it's an int.

    d4bcb7b1-d967-45e4-b312-45affcde5f15 commented 3 years ago

    Dear Mr. Jollans,

    Thanks for your additions, I think I know well about the SHELL programming language, now, every thing about the low-level file descriptors is linked together. I guess now I completely know your writing, thanks again for your detail explanations!

    The potential “bug” of your consideration is the post operation caused by OS.close() in the delay/conflict of tempfile.Temporary(), as a result, fd does not know whether its linked file descriptor is valid or not due to “with” method.

    However, if you have a look on “test_4” in attached “xtempfile.py”, you will see that in the first fd generated by tempfile.Temporary(), it is assumed to be handled by the first “with”, no more other operations are executed. Even after first “with” it becomes invalid, there are no further executions performed.

    On the second fd returned by tempfile.TemporaryFile(), it should be a valid file descriptor no matter how the first one is dealt with. From the coding perspective, these two variables are just “happen to” be the same chars.

    Besides, from my testing, changing any one of them to the different variable name, then the problem is gone. Furthermore, if you have a look on “test_3”, “test_5” and “test_6” (xtempfile.py), especially on “test_5”, they all prove that it should be a bug.

    To go deeper, I attached another test file “new-xtempfile.py”, on “test11”, you will see that it should have some problems on "\_del__" method if its linking file descriptor is closed by os.close().

    They two may or may not be the same problem.

    Thanks again for your illustration of “fd.fileno()” and “fd.name”.