python / cpython

The Python programming language
https://www.python.org
Other
61.99k stars 29.81k forks source link

argparse.FileType opens a file and never closes it #58032

Open eccacd9a-6179-451a-b71a-fc2e4713fe32 opened 12 years ago

eccacd9a-6179-451a-b71a-fc2e4713fe32 commented 12 years ago
BPO 13824
Nosy @merwok, @mitar, @matrixise, @MojoVampire, @remilapeyre, @Septatrix
Files
  • patch_3.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['type-bug', 'library'] title = 'argparse.FileType opens a file and never closes it' updated_at = user = 'https://bugs.python.org/DavidLayton' ``` bugs.python.org fields: ```python activity = actor = 'Nils Kattenbeck' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'David.Layton' dependencies = [] files = ['31852'] hgrepos = [] issue_num = 13824 keywords = ['patch'] message_count = 15.0 messages = ['151615', '152518', '152519', '152530', '166068', '198336', '198376', '215298', '341670', '341761', '341764', '341852', '342153', '342503', '398184'] nosy_count = 13.0 nosy_names = ['bethard', 'eric.araujo', 'mitar', 'Paolo.Elvati', 'manveru', 'Stefan.Pfeiffer', 'paul.j3', 'David.Layton', 'matrixise', 'josh.r', 'remi.lapeyre', 'sebix', 'Nils Kattenbeck'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue13824' versions = ['Python 2.7'] ```

    Linked PRs

    eccacd9a-6179-451a-b71a-fc2e4713fe32 commented 12 years ago

    argparse.FileType.__call__ opens the specified file and returns it. This is well documented as an anit-idiom in http://docs.python.org/howto/doanddont.html#exceptions.

    "...a serious problem — due to implementation details in CPython, the file would not be closed when an exception is raised until the exception handler finishes; and, worse, in other implementations (e.g., Jython) it might not be closed at all regardless of whether or not an exception is raised."

    Disregarding the above, handling a file which may or may not have been opened depending the users input requires a bit of boilerplate code compared to the usual with-open idiom.

    Additionally, there is no way to prevent FileType from clobbering an existing file when used with write mode.

    Given these issues and others, it seems to me that the usefulness of FileType is outweighed by propensity to encourage bad coding. Perhaps, it would be best if FileType (or some replacement) simply checked that the file exists (when such a check is appropriate), it can be opened in the specified mode, and, curry the call to open (i.e. return lambda: open(string, self._mode, self._bufsize))

    merwok commented 12 years ago

    Additionally, there is no way to prevent FileType from clobbering an existing file when used with write mode.

    I think this deserves its own feature request: now that Python 3.3 has an “exclusive create” mode, argparse.FileType could gain support for it. Would you open that request?

    Given these issues and others,

    We have one issue and one missing feature; what are the other issues?

    it seems to me that the usefulness of FileType is outweighed by propensity to encourage bad coding.

    I think the problem is not as bad as you paint it. A great number of unclosed file handles may cause problem to some OSes, but that’s it. On the plus side, the fact that argparse accepts for its type argument any callable that can check and convert a string input is simple, clean and works. FileType is needed.

    In packaging/distutils2 for example we have similar functions that return an open file object and never close it: the responsibility is at a higher level. Other packaging code calling these functions does so in a with statement. It is not evil by nature.

    The problem here is that FileType may return stdin or stdout, so we can’t just always close the file object (or maybe we can, say using an atexit handler?).

    Perhaps, it would be best if FileType (or some replacement) simply checked that the file exists

    But then you’d run into race conditions. The only sure was to say if a file can be opened is to open it.

    merwok commented 12 years ago

    s/sure was/sure way/

    eccacd9a-6179-451a-b71a-fc2e4713fe32 commented 12 years ago

    Eric,

    But then you’d run into race conditions. The only sure was to say if a file can be opened is to open it.

    I think you misunderstand me. I am NOT suggesting that you open and close the file. I am saying that you should not open it in the first place.

    If I cannot open the file at the point in my program where I actually want to open it, then fine, I can decide, in my own code, what to do.

    I think the problem is not as bad as you paint it. A great number of unclosed file handles may cause problem to some OSes, but that’s it. On the plus side, the fact that argparse accepts for its type argument any callable that can check and convert a string input is simple, clean and works. FileType is needed.

    Causing a problem on some OSes and not others is worse than causing a problem on all OSes as it increases the likelihood of buggy code passing tests and moving to production.

    I think argparse is wonderful; I just think that by having FileType not open the file, the number of it's use cases is increased. As it stands now, I would prefer the just pass the argument as a string argument and handling the opening myself unless:

    1. I wanted my program to open the file at the very beginning of the program (where one traditionally handles arg parsing)
    2. I wanted to exit on the first, and only, attempt to open the file
    3. I really did not care if the file closed properly--which, granted, is often the case with tiny scripts

    The moment any of these is not true due to a change in requirements, I will have to refactor my code to use a filename arg. Where as if I start out with a bog-standard filename and open it myself, I can easily add the behaviour I want. I just don't see FileType as a big convenience. However, I do see that changing this would break backwards compatibility and would not want to see that happen. Perhaps a new FileNameType that does some basic, perhaps, optional checks would have wider use-cases.

    I hope this helps.

    David Layton

    8955c213-fd54-471c-9758-9cc5f49074db commented 12 years ago

    So I generally agree that FileType is not what you want for anything but quick scripts that can afford to either leave a file open or to close stdin and/or stdout. However, quick scripts are an important use case for argparse, so I don't think we should get rid of FileType.

    What should definitely happen:

    What could potentially happen if someone really wanted it:

    7a064fe6-c535-4d80-a11f-a04ed39056c5 commented 10 years ago

    In this patch I implement a FileContext class. It differs from FileType in 2 key areas:

    The resulting argument is meant to be used as:

        with args.input() as f:
            f.read()
            etc

    The file is not opened until value is called, and it will be closed after the block. stdin/out can also be used in this context, but without closing.

    The signature for this class is the same as for FileType, with one added parameter, 'style'. (alternative name suggestions welcomed).

    class argparse.FileContext(mode='r', bufsize=-1, encoding=None, errors=None, style='delayed')

    The default behavior, "style='delayed'", is as described above.

    "style='evaluate'" immediately calls the partial, returning an opened file. This is essentially the same as FileType, except for the stdin/out context wrapping.

    "style='osaccess'", adds os.acccess testing to determine whether the 'delayed' file can be read or written. It attempts to catch the same sort of OS errors that FileType would, but without actually opening or creating the file.

    Most of the added tests in test_argparse.py copy the FileType tests. I had to make some modifications to the testing framework to handle the added levels of indirection.

    I have not written the documentation changes yet.

    A sample use case is:

        import argparse, sys
        p = argparse.ArgumentParser()
        p.add_argument('-d','--delayed', type=argparse.FileContext('r'))
        p.add_argument('-e','--evaluated', type=argparse.FileContext('r', style='evaluate'))
        p.add_argument('-t','--test', dest='delayed', type=argparse.FileContext('r', style='osaccess'))
        p.add_argument('-o','--output', type=argparse.FileContext('w', style='osaccess'), default='-')
        p.add_argument('--unused', type=argparse.FileContext('w', style='osaccess'),help='unused write file')
        args = p.parse_args()
    
        with args.output() as o:
            if args.delayed:
                with args.delayed() as f:
                    print(f.read(), file=o)
            if args.evaluated:
                with args.evaluated as f:
                    print(f.read(), file=o)
        # f and o will be closed if regular files
        # but not if stdin/out
        # the file referenced by args.unused will not be created
    7a064fe6-c535-4d80-a11f-a04ed39056c5 commented 10 years ago

    An alternative way to delay the file opening, is to return an instance that has a filename attribute, and has an open method. This can be compactly added to the FileContext that I defined in the previous patch. The FileContext provides the _ostest function (testing using os.access), and wrapper for stdin/out.

        class FileOpener(argparse.FileContext):
            # delayed FileType; alt to use of partial()
            # sample use:
            # with args.input.open() as f: f.read()
            def __call__(self, string):
                string = self._ostest(string)
                self.filename = string
                return self
            def open(self):
                return self.__delay_call__(self.filename)()
            file =  property(open, None, None, 'open file property')
    7a064fe6-c535-4d80-a11f-a04ed39056c5 commented 10 years ago

    From http://bugs.python.org/issue14156 "argparse.FileType for '-' doesn't work for a mode of 'rb'"

    I learned that 'stdin/out' can be embedded in an 'open' by using the 'fileno()' and 'closefd=False' (so it isn't closed at the end of open).

    With this, the dummy file context that I implemented in the previous patch, could be replaced with:

    partial(open, sys.stdin.fileno(), mode=self._mode,..., closefd=False)

    However, as Steven Bethard wrote in the earlier issue, setting up tests when stdin/out will be redirected is not a trivial problem. So I don't yet have a testable patch.

    --------------- And just for the record, the 'osaccess' testing that I wrote earlier, probably should also test that proposed output file string is not already a directory.

    717908ce-bb9c-4e05-b89c-d98143e51333 commented 5 years ago

    Why not make FileType instance also a context manager, so you could do something like:

    with arguments.input as f:
       assert arguments.input is f

    For backwards compatibility, FileType would open eagerly as now, but it would be closed at exit from context manager.

    We should just make sure we do not close stdout/stderr.

    99ffcaa5-b43b-4e8e-a35e-9c890007b9cd commented 5 years ago

    Mitar: argparse.FileType's __call__ (which is what "parses" the argument) returns whatever open (aka io.open) returns. Basically, post-parsing, there is nothing about the result of FileType that distinguishes it from a manually opened file (or when '-' is passed, from stdin/stdout). So it's already possible to do what you describe, simply by doing:

    parser = argparse.ArgumentParser()
    parser.add_argument('input', type=argparse.FileType())
    arguments = parser.parse_args()
    
    with arguments.input as f:
       assert arguments.input is f

    That already "open[s] eagerly as now, but it would be closed at exit from context manager."

    The original report did not like that you couldn't prevent clobbering of an existing file in write mode (no longer true, since you can pass a mode of 'x' just like you can with open), and did not like that the result of parsing was implicitly opened immediately without being controllable *immediately* via a context manager.

    The only real solutions to that problem are either:

    1. Making FileType (or a replacement) that is lazier in some way.
    2. Making the argument namespace itself support context management

    The proposal to check validity and return a curried call to open is problematic given the TOCTOU issues, but a curried version that performs little or no checks up front, but performs argparse style clean exits if the deferred open fails would be reasonable.

    The alternative would be to integrate more heavily with the argument namespace, such that you could write code like:

    with parser.parse_args() as arguments:

    and in that scenario, the files would be opened immediately and stored on the namespace, but either only FileType, or any type result supporting context management, would be bulk __enter-ed and bulk __exit-ed upon exiting the with block.

    As is, a user could do most of this with contextlib.ExitStack, but it's more to reimplement (and tricky to get right, and would still rely on argparse to clean up files 1 through n-1 if opening file n fails for whatever reason before parsing completes).

    23982c60-ed6c-47d1-96c2-69d417bd81b3 commented 5 years ago

    For information, bpo-33927 is related. I removed one of the FileType from json.tool argument parser to work around this behavior (https://github.com/python/cpython/pull/7865/files#diff-e94945dd18482591faf1e294b029a6afR44).

    If paul.j3 does not have his patch anymore I volunteer to implement option 1 to have a better alternative in the stdlib.

    717908ce-bb9c-4e05-b89c-d98143e51333 commented 5 years ago

    So it's already possible to do what you describe, simply by doing:

    I think there is an edge case here if a stdin/stdout is opened? It would get closed at exist from the context, no?

    So I suggest that a slight extension of what open otherwise returns is returned which make sure context manager applies only to non-stdin/stdout handles.

    7a064fe6-c535-4d80-a11f-a04ed39056c5 commented 5 years ago

    While I continue to follow argparse issues, I'm not doing development (with my own repository, etc).

    I haven't revisited the issue since 2013, and don't know that much more about Python file handling. Occasionally 'FileType' issues come up on StackOverflow, and my most common suggestion is accept a string, and open the file yourself.

    In writing a solution, keep backward compatibility foremost in mind. We don't want to mess things up for existing code. And keep FileType simple, consistent with Steven's original intent - as a tool for simple input/output scripts.

    As for the idea of

    2 - Making the argument namespace itself support context management

    someone could experiment with a new Namespace class with the proper enter/exit methods. The use of a custom Namespace class is documented

    https://docs.python.org/3/library/argparse.html#the-namespace-object

    So the idea can be implemented and thoroughly tested without altering the default Namespace class and its use. (With one caution, subparsers won't use this custom class.)

    matrixise commented 5 years ago

    Hi,

    Are you interested to write test cases? They could be useful for the fix.

    Thank you

    656b7fe4-067d-46a1-886d-dde2ab2b48e0 commented 3 years ago

    A good alternative would be to use pathlib.Path. The only downside would be that one has to manually handle - but besides that it solves most problems.

    Still the fact that file descriptors are kept open should be added as a warning to the docs

    merwok commented 2 years ago

    A good alternative would be to use pathlib.Path.

    How so? FileType lets you specify that the argument should be readable or writable, so this would be a loss of functionality.

    serhiy-storchaka commented 8 months ago

    The current design has too many flaws. Open files are leaked if argparse itself fails or the user code fails or quits before it has chance to close all files. Output files are always rewritten in case of error. Input files can remain open for too long time and consume the limited resource of file descriptors. And solving it in the user code is difficult even if we ignore errors that can occur during parsing (see for example the pickletools CLI #85567).

    All solutions (making file opening lazy (and maybe explicit) or making the parser or the namespace object a context manager) require changing the user code. Minimizing changes in the user code increases chance of writing user code that does not do all thing properly.

    On other hand, explicit opening and closing files is not much more complex that proper use of FileType or its replacement. The only non-trivial thing that FileType does in comparison with the plain open() is returning the standard stream (depending on mode) if the argument is "-". We can provide an utility function for this, although the code is only non-trivial because it is general, for any concrete mode string it is simpler.

    I propose to deprecate argparse.FileType. Its existence in the stdlib implies that it is a preferable way to handle file arguments, but it is not so. You have less issues if you do not use it.

    gvanrossum commented 7 months ago

    +1. Well argued. I’m not even sure we need a helper for ‘-‘ ; that API would be a bit convoluted, since closing stdin is fraught.