python / cpython

The Python programming language
https://www.python.org
Other
63.16k stars 30.24k forks source link

Add methods to `pathlib.Path`: `write_text`, `read_text`, `write_bytes`, `read_bytes` #64417

Closed 06547701-06a2-4e4a-b672-16a33475101e closed 10 years ago

06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago
BPO 20218
Nosy @birkenfeld, @pitrou, @cool-RR
Files
  • pathlib.readwrite3.patch: minimal pathlib read/write functions with tests
  • pathlib.readwrite4.patch
  • pathlib.readwrite3_with_exclusive2.patch: minimal pathlib read/write functions with tests (exclusive option added, typo fixed)
  • patch.patch
  • pathlib_readwrite_v5.patch
  • pathlib_readwrite_v6.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['type-feature', 'library'] title = 'Add methods to `pathlib.Path`: `write_text`, `read_text`, `write_bytes`, `read_bytes`' updated_at = user = 'https://github.com/cool-RR' ``` bugs.python.org fields: ```python activity = actor = 'pitrou' assignee = 'none' closed = True closed_date = closer = 'python-dev' components = ['Library (Lib)'] creation = creator = 'cool-RR' dependencies = [] files = ['33670', '33729', '33786', '33840', '36761', '36762'] hgrepos = [] issue_num = 20218 keywords = ['patch'] message_count = 50.0 messages = ['207874', '207952', '208604', '208605', '208625', '208635', '208636', '208639', '208640', '208744', '208785', '208878', '208879', '208889', '208924', '208930', '208931', '208940', '208947', '208949', '208951', '208991', '208992', '208999', '209002', '209017', '209029', '209053', '209326', '209584', '209594', '209606', '209832', '210194', '210341', '211925', '214141', '214142', '214144', '214151', '214228', '214289', '227898', '227900', '227913', '227914', '227926', '227947', '228091', '228092'] nosy_count = 6.0 nosy_names = ['georg.brandl', 'pitrou', 'matthiastroffaes', 'cool-RR', 'python-dev', 'cjwelborn'] pr_nums = [] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue20218' versions = ['Python 3.5'] ```

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    I'd really like to have methods pathlib.Path.write and pathlib.Path.read. Untested implementation:

    def read(self, binary=False):
        with self.open('br' is binary else 'r') as file:
            return file.read()
    
    def write(self, data. binary=False):
        with self.open('bw' is binary else 'w') as file:
            file.write(data)

    This will be super useful to me. Many files actions are one liners like that, and avoiding putting the with clause in user code would be wonderful.

    Antoine suggests that binary shouldn't be an argument, that there should be separate methods for reading/writing text and binary contents, and that the text one would require passing in encoding and other parameters. I'll be happy to add these to the implementation and create a patch, once people can define which parameters should be used.

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    (Replace is with if in my code sample, typo.)

    7a6470d7-5dc3-4d87-8f72-3ed447491835 commented 10 years ago

    Isn't this something you could do yourself?

    import pathlib
    def pathread(self, binary=False):
        with self.open('br'if binary else 'r') as fread:
            return fread.read()
    
    def pathwrite(self, data, mode='w'):
        with self.open(mode) as fwrite:
            fwrite.write(data)
    
    pathlib.Path.read = pathread
    pathlib.Path.write = pathwrite
    p = pathlib.Path('/mydir/example.txt')
    p.read()
    # 'Content from path.\n'
    
    p.write('I am appending.\n', mode='a')
    p.read()
    # 'Content from path.\nI am appending.\n'

    ...and what about: "There should be one-- and preferably only one --obvious way to do it."

    7a6470d7-5dc3-4d87-8f72-3ed447491835 commented 10 years ago

    I was referring to writing helper functions like pathread(mypath), though the monkeypatch version (p.read()) does work (it did when I tested it).

    serhiy-storchaka commented 10 years ago

    Not every two line function are worth to be added to the stdlib.

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    Christopher and Serhiy, I would appreciate if you could kindly explain why your arguments, while applying to my suggestions, do not apply to the following functions:

    I am quite sick of all those lofty principles being quoted when I want to add something to Python, yet I see Python features that break them all the time.

    pitrou commented 10 years ago

    FWIW, I agree that shortcuts to easily create or read entire files are useful. Other path classes (such as Twisted's) often have them.

    serhiy-storchaka commented 10 years ago

    Christopher and Serhiy, I would appreciate if you could kindly explain why your arguments, while applying to my suggestions, do not apply to the following functions:

    1. Path.stat() wraps only one function, while Path.read() wraps two functions. It's signature should be a sum of open() and read() signatures: Path.read(encoding, errors, newline, size) (I have omitted some open's parameters). This is a little cumbersome.

    2. os.stat() requires import os, while open() is builtin.

    3. If add Path.read(), what about Path.readlines() and Path.readline()?

    4. I believe Path.stat() will be used much more often than Path.read().

    5. Path.stat() corresponds to low-level os.stat(), but for low-level os.read() the high-level corresponding is FileIO.read(). And this corresponding is much more universal and useful.

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    Serhiy:

    Your arguments 1 and 2 are pretty weak. (So what if an import is required? It's still 2 lines. I thought that "Not every two line function are worth to be added to the stdlib.")

    Regarding stat being used much more often than read: I disagree. I've done whole-file reads much more often than I've done stat. Different people might have different habits, but I definitely think that reading an entire file is a common operation with files. And we can all agree that other trivial Path method like lstat are much less common than both stat and whole-file read, yet they've earned their place in the stdlib, while we are still arguing whether Path.read and path.write should have a place too.

    Regarding Path.readline and Path.readlines: I don't have an opinion on whether they should be implemented or not, since I rarely use them. But I definitely think that they have no bearing on the decision of whether to include Path.read and Path.write or not.

    Your argument 5 also looks weak to me. It looks to me like you're looking for arguments.

    7a6470d7-5dc3-4d87-8f72-3ed447491835 commented 10 years ago

    Ram Rachum: If I actually had a say in the matter, I wouldn't be totally against it. (so far I only lurk the lists/tracker looking for things to work on.) I was just offering you a solution that you can use _today_.

    I do think that if something like this was added, then you would have to go ahead and add the other read/write functions because the next question would be 'how come pathlib has no readlines()?'.

    Also, it would definitely need more than 'mode' or 'binary' for args. The encoding option needs to be taken into account. People need to have a choice in what encoding to use even with a little read()/write().

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    Patch attached. Is this good?

    7a6470d7-5dc3-4d87-8f72-3ed447491835 commented 10 years ago

    Ram Rachum (and to whom it may concern):

    This patch was missing the size argument for readlines(), and it did not match the overall style of pathlib.py. (using ''' instead of """, and other docstring style). It also clobbered the builtin 'file'.

    I've attached another patch that tries very hard to match that 'style', and wrote tests that try to match what the original 'io' functions test for.

    This is my first patch for python-dev, I am attaching it in the hope it will help resolve the issue, or generate more discussion about it.

    I ran both the individual test_pathlib tests, and the entire test suite. Also, the patchcheck sanity test.

    7a6470d7-5dc3-4d87-8f72-3ed447491835 commented 10 years ago

    Sorry Antoine, I should've done my homework. I didn't realize 'to whom it may concern' is actually you. :)

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    Hi Christopher,

    I like your patch. One thing I modified is returning to use file as the variable instead of f, since file is no longer a builtin in Python 3, and descriptive variable names are important. Attached as pathlib.readwrite2.patch.

    7a6470d7-5dc3-4d87-8f72-3ed447491835 commented 10 years ago

    Ram Rachum: Have you look at the source for pathlib? f and fd are common names for file descriptors, especially when using the open() function.

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    You're right. I deleted my 2 patches, so pathlib.readwrite.patch is now the best patch for this.

    pitrou commented 10 years ago

    Thanks for the patches, but please let's not hurry this up. We first need to decide on an API (which should be minimal).

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    Antoine:

    Which parts of the API merit discussion? The method names? Whether to include readlines/writelines? The arguments?

    pitrou commented 10 years ago

    Which parts of the API merit discussion? The method names? Whether to include readlines/writelines? The arguments?

    I think the readlines/writelines methods, as well as the size argument, should be dropped. Furthermore, there also should be distinct text/binary variants.

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    I see. I don't have an opinion about these 3 issues (readlines/writelines, size and binary separation.) So I'm cool with making these changes.

    If we do separate out the binary versions, what do you have in mind for the method names and signatures?

    pitrou commented 10 years ago

    If we do separate out the binary versions, what do you have in mind for the method names and signatures?

    Perhaps we should look at what other Path APIs do here, and how they name it.

    Realistically, we need at least read_bytes() and read_text() methods. The write() method may stay unique if it's polymorphic.

    7a6470d7-5dc3-4d87-8f72-3ed447491835 commented 10 years ago

    Django: https://docs.djangoproject.com/en/dev/ref/files/file/#django.core.files.File

    It looks like Django has a File object that wraps around Python's built-in file object. It offers a 'mode' attribute, and a read(numbytes=None) / write([content]) function. It also offers \_iter__ functionality. You must open the file, and then wrap it with File() like:

    myfile = File(open('myfile.txt', 'r'))

    Twisted: http://twistedmatrix.com/documents/current/api/twisted.python.filepath.FilePath.html

    Looking at twisted.python.filepath.FilePath, it looks like there is no read/write.

    These are two very popular frameworks/libraries, but I'll see if I can't find other sources for inspiration.

    I think read_bytes/read_text would offer a good alternative method for reading files, instead of trying to create a full-on replacement like the previous patches attempt to do.

    pitrou commented 10 years ago

    Twisted: http://twistedmatrix.com/documents/current/api/twisted.python.filepath.FilePath.html

    Looking at twisted.python.filepath.FilePath, it looks like there is no read/write.

    You overlooked the getContent and setContent methods.

    7a6470d7-5dc3-4d87-8f72-3ed447491835 commented 10 years ago

    Oops, I did. Thanks for that.

    So setContent overwrites the file like open('myfile', 'w').write() would, except it has an option to give the temporary file a different extension (in case of a crash while writing). That's understandable for Twisted. getContent returns a file-like object, which is more like Path's open().

    One thing I am not seeing is a readlines/writelines in these two libaries. I still think they would be useful, even without the size argument for readlines.

    So this is what I am seeing now: read_text(encoding=None) readlines_text(encoding=None) ..(or read_textlines?) read_bytes() readlines_bytes() write(data, append=False) ..(mode is decided based on data type) writelines(lines, append=False)

    ..determining the mode for writelines looks at the first item's type?

    The regular writelines fails with 'must be str, not [insert wrong type]' when opened with 'w', and '[insert wrong type] does not support the buffer interface' when opened with 'wb'.

    7a6470d7-5dc3-4d87-8f72-3ed447491835 commented 10 years ago

    I should clarify that last sentence, I was trying to say that the type is determined by the first item. If the user tries to mix bytes/text they will receive the proper error from io's writelines when the mismatched type is written. If the first item is neither str nor bytes, then raise a ValueError.

    pitrou commented 10 years ago

    One thing I am not seeing is a readlines/writelines in these two libaries. I still think they would be useful, even without the size argument for readlines.

    readlines() and writelines() aren't used a lot in my experience.

    So this is what I am seeing now: read_text(encoding=None) readlines_text(encoding=None) ..(or read_textlines?) read_bytes() readlines_bytes() write(data, append=False) ..(mode is decided based on data type) writelines(lines, append=False)

    ..determining the mode for writelines looks at the first item's type?

    I would suggest differently:

    Strictly speaking, write() could be polymorphic, but I think it's nice to have distinct methods 1) out of symmetry with read*() 2) to avoid silently accepting the wrong type.

    As a reference, https://github.com/mikeorr/Unipath (which is quite popular) has read_file() and write_file() methods. https://github.com/jaraco/path.py has bytes(), text(), write_bytes() and write_text().

    7a6470d7-5dc3-4d87-8f72-3ed447491835 commented 10 years ago

    Antoine said:

    I would suggest differently:

    • read_text(encoding, errors, newline)
    • read_bytes()
    • write_text(data, encoding, errors, newline)
    • write_bytes(data)

    Strictly speaking, write() could be polymorphic, but I think it's nice to have distinct methods 1) out of symmetry with read*() 2) to avoid silently accepting the wrong type.

    I am starting to see where you are going with this, and I agree with your latest points.

    I dropped readlines/writelines. I guess pathlib doesn't have to do *exactly* everything you can do through normal io. They are easy to implement anyway with .split() and .join().

    I realize this would not make it into Python for a while (3.5 possibly, if at all), but I went ahead and made a patch anyway because I have time to do so at the moment.

    I updated the tests to reflect the latest changes, and made sure all of them still pass. Any criticism/wisdom would be appreciated, as this is my first time dealing with the Python patch process.

    The api is what you have, except I put an 'append' option: read_bytes() read_text(encoding=None, errors=None, newline=None) write_bytes(data, append=False) write_text(data, encoding=None, errors=None, newline=None, append=False)

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    I like the patch. Except I'd like to have support for the 'x' flag in the write_text and write_bytes methods. I suggest an argument exclusive, which defaults to False. When exclusive=True, the mode will be 'x' or 'xb'.

    The first lines after each method definition should be:

        if append and exclusive:
            raise Exception("Can't use both `append` and `exclusive` modes together; `append` implies that the file exists, while `exclusive` implies it does not.")

    If you don't like long exception texts, you can shorten it to just the first sentence. Also, you may want to choose a different exception class than Exception.

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    New patch attached. Not tested.

    7a6470d7-5dc3-4d87-8f72-3ed447491835 commented 10 years ago

    Using 'review' with pathlib.readwrite4.patch, it looks like it only modifies one file (test_pathlib.py), which can't be right. Maybe you edited the patch directly instead of generating a new one? (also, the line-counts haven't changed and I think they were suppose to.)

    The python-dev guide may help you to move forward with this, and any future ideas. I definitely have a lot to learn myself, but I do know that the less leg-work core-devs have to do, the easier it is to make a contribution.

    Python dev-guide: http://docs.python.org/devguide/

    Here is the 'exclusive' feature with some tests, it raises TypeError when 'append' and 'exclusive' are used at the same time (mismatched args/kwargs usually raise TypeError in python from what I can tell).

    It's still missing the Doc changes, and probably more stuff that I don't know about yet.

    As usual, all tests pass including the python test-suite, and make patchcheck removed some whitespace for me. Thanks, -Chris (cjwelborn)

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    You're right Chris, I edited the patch naively and didn't know it wouldn't work. Your patch looks great except you probably want to change "except" to "accept" :)

    I hope I'll have time to work on the documentation addition soon. I'm assuming we want nothing more than method documentation on Path, right? I mean, just a short entry for each method, nothing more right?

    7a6470d7-5dc3-4d87-8f72-3ed447491835 commented 10 years ago

    hah, i did. I was working with 'except'ions and accidentally wrote 'except' instead of 'accept'. rookie mistake, its fixed now. As far as the docs I really can't say. Antoine would have the answers.

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    Patch with documentation attached.

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    Hi everyone,

    I'm waiting for someone to review my patch. I believe it includes everything that's needed to merge.

    pitrou commented 10 years ago

    Hi Ram, sorry, I'll take a look at your patch soon. (as you know, we're in feature freeze right now so your patch must wait for 3.4 to be released, anyway)

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    Any progress on this?

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    Thanks for the code review Antoine.

    It seems like the only non-trivial comment is regarding the append and exclusive arguments:

    "I don't think "append" and "exclusive" deserve to be arguments here. write_bytes()/write_text() is a convenience method for the common use case."

    Are you suggesting that these features be removed? I think it'll be really sad. I don't think that just because someone wants to use modes 'a' or 'x' it means that they should now write a with clause instead of using our one-liner. Just because something is made for convenience, doesn't mean it should be crippled.

    pitrou commented 10 years ago

    "I don't think "append" and "exclusive" deserve to be arguments here. write_bytes()/write_text() is a convenience method for the common use case."

    Are you suggesting that these features be removed?

    Yes. For "append", the reason is simple. The use case for appending is really repeated appending (e.g. writing to a log file), so providing a convenience for one-shot appending doesn't make sense. For "exclusive", it's not used often enough to warrant such a shortcut.

    I don't want these APIs to become kitchen sink APIs.

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    Okay, different approach: How about having a mode argument with a default? (Defaults of 'rb', 'r', 'wb', 'w' respectively.)

    This way users who wish to use append, exclusive, or any other future mode will be able to do that, but we won't be adding too many keywords to the API. We could add sanity checks to the mode if you wish.

    pitrou commented 10 years ago

    Okay, different approach: How about having a mode argument with a default? (Defaults of 'rb', 'r', 'wb', 'w' respectively.)

    This is redundant with the fact that there are several distinct methods: {read,write}_{bytes,text}.

    Really, the aim is to ease common operations, not to replace open() entirely.

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    I understand Antoine.

    At this point, while I could easily implement the changes you ask for in your review, I'm concerned that we are spending our time adding a feature to Python that nobody really loves.

    What I'd really love is a pair of methods read and write. (Less typing than the method names you asked for, and less methods added to the API.) And I'd love for these methods to be able to use append and exclusive. I understand that a feature like that is not acceptable to you, which I accept, since this is an open-source projects that needs to satisfy many people and not just myself.

    So the current way you want to implement this is not something I'm excited about and not something I'll be happy to use. The question is, are *you* excited about this feature the way you want to implement it? (With four methods and no append or exclusive.) Is this something you'd love to use? Do you think other people feel the same?

    If so, I'll be happy to finish editing this patch so this feature could go into Python. Otherwise, if we find no one who actually loves this feature, I think that the most beneficial action we could do for the Python community is to drop this issue right here, rather than add a command to the API that we'll have to support for decades despite the fact nobody really likes it.

    pitrou commented 10 years ago

    Then I'd rather wait for other people to chime in and state whether they are interested in this feature.

    0e13c2e4-53f0-40f1-9a0e-56ccd3bb4ea5 commented 10 years ago

    Chiming in here: Sphinx's testing framework does include a feature that allows easily read/write files into/from text/bytes directly from path-like objects. There is thus a demand out there. If this feature were to make it into stdlib, it would be loved at least by Sphinx testers and sphinx extension module testers.

    Current implementation in Sphinx:

    https://bitbucket.org/birkenfeld/sphinx/src/f87ae5c0272e7384dc976414d091aa8b175827cd/tests/path.py?at=default#cl-129

    Discussion to move to pathlib on Sphinx tracker:

    https://bitbucket.org/birkenfeld/sphinx/issue/1241/move-test-utilities-into-a-sphinx#comment-12645576

    Some code examples of how this is typically used in Sphinx:

    https://bitbucket.org/tk0miya/sphinx-testing/src/ee89298fa8f848b7c3ca93656df5330db85b4291/README.rst?at=default

    https://github.com/mcmtroffaes/sphinxcontrib-bibtex/commit/0a0bf07a34e6f7e3c66ddacc0da02b4a2caba794

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    Matthias: Do you prefer having both write_bytes and write_text instead of just write with a binary option? Do you prefer append and exclusive modes to be allowed or not?

    birkenfeld commented 10 years ago

    Note that these methods were already part of Jason's path.py when I imported it for use by Sphinx.

    I think these convenience methods are useful indeed, so if it is fine with your philosophy for pathlib, I'd be happy to see them there. The write_text/write_bytes style would be preferred for me, and I agree that too many options, e.g. append and exclusive, are not as helpful.

    0e13c2e4-53f0-40f1-9a0e-56ccd3bb4ea5 commented 10 years ago

    Thanks for the quick response. I agree with Georg on all points, i.e. longer function names and no extra options.

    birkenfeld commented 10 years ago

    Next try.

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    What do you think about changing the signature to something like this:

    def read_text(self, *, encoding=None, errors=None):

    This is to avoid these arguments being used positionally, which eases future backwards compatibility changes.

    1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 10 years ago

    New changeset a4da150fbfd4 by Georg Brandl in branch 'default': Closes bpo-20218: Added convenience methods read_text/write_text and read_bytes/ https://hg.python.org/cpython/rev/a4da150fbfd4

    pitrou commented 10 years ago

    Thanks, Georg!