python / cpython

The Python programming language
https://www.python.org
Other
62.75k stars 30.07k forks source link

Add attributes to os.stat results #35189

Closed a8466748-c610-4c39-acea-986fbed22989 closed 22 years ago

a8466748-c610-4c39-acea-986fbed22989 commented 23 years ago
BPO 462296
Nosy @mwhudson, @gvanrossum, @loewis, @freddrake
Files
  • stat_return.diff: Patch for new os.stat behavior.
  • patch2.diff: EXAMPLE ONLY patch for extended behavior on posix for st_{blksize
  • rdev blocks}
  • patch3.diff: EXAMPLE ONLY #2: better patch for extended behavior on posix for st_{blksize
  • rdev blocks}.
  • stat_4.diff: Trail C-only version: posixmodule.c only.
  • stat_5.diff: C-only version (v2): posixmodule.c only.
  • stat_patch_6.diff: C-only version (v3): posixmodule.c only.
  • patch_7.zip: C-only patch (v4): zip file.
  • stat_patch_8.zip: C-only patch (v5): zip file
  • libos.tex.diff: Documentation patch for libos.tex
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/freddrake' closed_at = created_at = labels = ['library'] title = 'Add attributes to os.stat results' updated_at = user = 'https://bugs.python.org/nickm' ``` bugs.python.org fields: ```python activity = actor = 'loewis' assignee = 'fdrake' closed = True closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'nickm' dependencies = [] files = ['3622', '3623', '3624', '3625', '3626', '3627', '3628', '3629', '3630'] hgrepos = [] issue_num = 462296 keywords = ['patch'] message_count = 35.0 messages = ['37622', '37623', '37624', '37625', '37626', '37627', '37628', '37629', '37630', '37631', '37632', '37633', '37634', '37635', '37636', '37637', '37638', '37639', '37640', '37641', '37642', '37643', '37644', '37645', '37646', '37647', '37648', '37649', '37650', '37651', '37652', '37653', '37654', '37655', '37656'] nosy_count = 5.0 nosy_names = ['mwh', 'gvanrossum', 'loewis', 'fdrake', 'nickm'] pr_nums = [] priority = 'normal' resolution = 'accepted' stage = None status = 'closed' superseder = None type = None url = 'https://bugs.python.org/issue462296' versions = [] ```

    a8466748-c610-4c39-acea-986fbed22989 commented 23 years ago

    See bug bpo-111481, and PEP-0042. Both suggest that the return values for os.{stat,lstat,statvfs,fstatvfs} ought to be struct-like objects rather than simple tuples.

    With this patch, the os module will modify the aformentioned functions so that their results still obey the previous tuple protocol, but now have read-only attributes as well. In other words, "os.stat('filename')[0]" is now synonymous with "os.stat('filename').st_mode.

    The patch also modifies test_os.py to test the new behavior.

    In order to prevent old code from breaking, these new return types extend tuple. They also use the new attribute descriptor interface. (Thanks for PEP-025[23], Guido!)

    Backward compatibility: Code will only break if it assumes that type(os.stat(...)) == TupleType, or if it assumes that os.stat(...) has no attributes beyond those defined in tuple.

    a8466748-c610-4c39-acea-986fbed22989 commented 23 years ago

    Logged In: YES user_id=499

    BTW, if this gets in, I have another patch that adds support for st_blksize, st_blocks, and st_rdev on platforms that support them. It don't expose these new fields in the tuple, as that would break all the old code that tries to unpack all the fields of the tuple. Instead, these fields are only accessible as attributes.

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 23 years ago

    Logged In: YES user_id=21627

    I second the request for supporting additional fields where available. At the same time, it appears unimplementable using pure Python.

    Consequently, I'd like to see this patch redone in C. The implementation strategy could probably remain the same, i.e. inherit from tuple for best compatibility; add the remaining fields as slots. It may be reasonable to implement attribute access using a custom getattr function, though.

    I have also my doubts about the naming of the fields. The st_ prefix originates from the time where struct fields were living in the global namespace (i.e. across different structures), so prefixing them for uniqueness was essential. I'm not sure whether we should inherit this into Python...

    a8466748-c610-4c39-acea-986fbed22989 commented 23 years ago

    Logged In: YES user_id=499

    Martin: I'm not entirely sure what you mean here; while my patch for extra fields requires a minor chunk of C (to access the struct fields), the rest still works in pure python. I'm attaching this second version for reference.

    I'm not sure it makes much sense to do this with pure C; it would certainly take a lot more code, with little benefit I can descern. But you're more experienced than I; what am I missing?

    I agree that the field naming is suboptimal; I was taking my lead from the stat and statvfs modules. If people prefer, we can name the fields whatever we like.

    a8466748-c610-4c39-acea-986fbed22989 commented 23 years ago

    Logged In: YES user_id=499

    On further consideration, the approach taken in the second (example only) patch is indeed too fragile. The C code should not lengthen the tuple arbitrarily and depend on the Python code to decode it; instead, it should return a dictionary of extra fields. I think that this approach uses a minimum of C, is easily maintainable, and very extensible.

    a8466748-c610-4c39-acea-986fbed22989 commented 23 years ago

    Logged In: YES user_id=499

    Here's the revised (example only) patch that takes the more portable approach I mention below.

    gvanrossum commented 23 years ago

    Logged In: YES user_id=6380

    Haven't had time to review the patch yet, but the idea of providing a structure with fields that doubles as a tuple is a good one. It's been tried before and can be done in pure Python as well.

    Regarding the field names: I think the field names should keep their st_ prefix -- IMO this makes the code more recognizable and hence readable.

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 23 years ago

    Logged In: YES user_id=21627

    The problem with your second and third patch is that it includes an incompatibility for users of posix.stat (and friends), since it changes the siye of the tuple. If you want to continue to return a tuple (as the top-level data structure), you'll break compatibility for applications using the C module directly. An example of code that would be broken is

    mode, ino, dev, nlink, uid, gid, size, a, c, m = posix.stat(filename)

    To pass the additional fields, you already need your class _StatResult available in C. You may find a way to define it in Python and use it in C, but that has proven to be very fragile in the past.

    a8466748-c610-4c39-acea-986fbed22989 commented 23 years ago

    Logged In: YES user_id=499

    Ah! Now I see. I hadn't realized that anybody used the posix module directly. (People really do this?)

    I'll try to write up a patch in C tonight or tomorrow morning. A couple of questions on which I could use advice: (1) Where is the proper place to put this kind of tuple-with-fields hybrid? Modules? Objects? In a new file or an existing one? (2) Should I try to make it general enough for non-stat use?

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 23 years ago

    Logged In: YES user_id=21627

    Using posix.stat is common, see

    http://groups.yahoo.com/group/python-list/message/4349 http://www.washington.edu/computing/training/125/mkdoc.html http://groups.google.com/groups?th=7d7d118fed161e0&seekm=5qdjch%24dci%40nntp6.u.washington.edu

    for examples. None of these would break with your change, though, since they don't rely on the lenght of the tuple.

    If you are going to implement the type in C, I'd put it in the posix module. If you are going to implement it in Python (and only use it from the Posix module), making it general-purpose may be desirable. However, a number of things would need to be considered, so a PEP might be appropriate. If that is done, I'd propose an interface like

    tuple_with_attrs((value-tuple), (tuple-of-field-names), exposed-length-of-tuple))

    a8466748-c610-4c39-acea-986fbed22989 commented 23 years ago

    Logged In: YES user_id=499

    I'm becoming more and more convinced that doing it in C is the right thing, but I have issue with doing it in the posix module. The stat function is provided on (nearly?) all platforms, and doing it in C will require minor changes to all of these modules. We can probably live with this, but I don't think we should duplicate code between all of the os modules.

    Is there some other appropriate place to put it in C?

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 23 years ago

    Logged In: YES user_id=21627

    There aren't actually so many copies of the module, since posixmodule implements "posix","nt", and "os2". I found alternative implementations in riscosmodule and macmodule.

    Still, putting the support type into a shared C file is appropriate. I can think of two candidate places: tupleobject.c and fileobject.c. It may be actually worthwhile attempting to share the stat() implementations as well, but that could be an add-on.

    gvanrossum commented 23 years ago

    Logged In: YES user_id=6380

    Or you could put it in modsupport.c, which is already a grab-bag of handy stuff.

    a8466748-c610-4c39-acea-986fbed22989 commented 23 years ago

    Logged In: YES user_id=499

    Well, here's a posixmodule-only, all-C version. If this seems like a good approach, I'll add some better docstrings, move it into whichever module you like, and make riscosmodule.c and macmodule.c use it too.

    a8466748-c610-4c39-acea-986fbed22989 commented 23 years ago

    Logged In: YES user_id=499

    And here's an even better all-C version. (This one doesn't use a dictionary to store optional attributes.)

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 23 years ago

    Logged In: YES user_id=21627

    The patch looks good to me. Are you willing to revise it one more time to cover all the stat implementations? A few comments on the implementation:

    a8466748-c610-4c39-acea-986fbed22989 commented 23 years ago

    Logged In: YES user_id=499

    I've fixed it with the suggestions you made, and also 1) Added docstrings 2) Fixed a nasty segfault bug that would be triggered by os.stat("/foo").__class__((10,)).st_size and added tests to keep it from reappearing.

    I'm not sure I know how to cover Mac and RISCOS properly: riscos.stat returns a 13-element tuple, and is hence already incompatible with posix.stat; whereas mac.{stat|xstat} return differing types.

    If somebody with experience with these modules could let give me guidance as to the Right Thing, I'll be happy to give it a shot... but my shot isn't likely to be half as good as somebody who knew the modules better. (For example, I don't have the facilities to compile macmodule or riscmodule at all, much less test them.)

    I'd also be glad to make any changes that would help maintainers of those modules.

    gvanrossum commented 23 years ago

    Logged In: YES user_id=6380

    One comment on the patch: beautiful use of the new type stuff, but there's something funky with the constructors going on. It seems that the built-in __new (inherited from the tuple class) requires exactly one argument -- a sequence to be tuplified -- but your __init requires 13 arguments. So construction by using posix.stat_result(...) always fails. It makes more sense to fix the init routine to require a 13-tuple as argument. I would also recommend overriding the tp_new slot to require a 13-tuple: right now, I can cause an easy core dump as follows:

    >>> import os
    >>> a = os.stat_result.__new__(os.stat_result, ())
    >>> a.st_ctime
    Segmentation fault (core dumped)
    $
    gvanrossum commented 23 years ago

    Logged In: YES user_id=6380

    Another comment: we should move this to its own file so that other os.stat() implementations (esp. MacOS, maybe RiscOS) that aren't in posixmodule.c can also use it, rather than having to maintain three separate versions of the code.

    a8466748-c610-4c39-acea-986fbed22989 commented 23 years ago

    Logged In: YES user_id=499

    I think this might be the one... or at least, the next-to-last-one. This version of the patch:

    (1) moves the shared C code into a new module, "_stat", for internal use. (2) updates macmodule and riscosmodule to use the new code. (3) fixes a significant reference leak in previous versions. (4) is immune to the __new and __init bugs in previous versions.

    Things to note: (A) I've tried to make sure that my Mac/RISCOS code was correct, but I don't have any way to compile or test it. (B) I'm not sure my use of PyImport_ImportModule is legit. (C) I've allowed users to construct instances of stat_result with \< or > 13 arguments. When this happens, attempts to get nonexistant attributes now raise AttributeError. (D) When dealing with Mac.xstat and RISCOS.stat, I chose to keep backward compatibility rather than enforcing the 10-tuple rule in the docs.

    Because there are new files, I can't make 'cvs diff' get everything. I'm uploading a zip file that contains _statmodule.c, _statmodule.h, and a unified diff. Please let me know if you'd prefer a different format.

    gvanrossum commented 23 years ago

    Logged In: YES user_id=6380

    Nick, what's your real email? I have a bunch of feedback related to your use of the new type stuff -- this is uncharted territory for me too, and this SF box is too small to type comfortably.

    a8466748-c610-4c39-acea-986fbed22989 commented 23 years ago

    Logged In: YES user_id=499

    I've sent my email address to 'guido at python.org'. For reference, it's 'nickm at alum.mit.edu'.

    a8466748-c610-4c39-acea-986fbed22989 commented 23 years ago

    Logged In: YES user_id=499

    The fifth all-C (!) version, with changes as suggested by Guido's comments via email.

    Big changes: This version no longer subclasses tuple. Instead, it creates a general-purpose mechanism for making struct/sequence hybrids in C.

    It now includes a patch for timemodule.c as well. Shortcomings: (1) As before, macmodule and riscosmodule aren't tested. (2) These new classes don't participate in GC and aren't subclassable. (Famous last words: "I don't think this will matter." :) ) (3) This isn't a brand-new metaclass; it's just a quick bit of C. As such, you can't use this mechanism to create new struct/tuple hybrids from Python. (I claim this isn't a drawback, since it's way easier to reimplement this in python than it is to make it accessible from python.)

    So, how's *this* one?

    mwhudson commented 23 years ago

    Logged In: YES user_id=6656

    If this goes in, I'd like to see it used for termios.tc {get,set}attr too.

    I could probably implement this (but not *right* now...).

    gvanrossum commented 23 years ago

    Logged In: YES user_id=6380

    Patience, please. I'm behind reviewing this, probably won't have time today either.

    gvanrossum commented 22 years ago

    Logged In: YES user_id=6380

    I'm looking at this now.

    gvanrossum commented 22 years ago

    Logged In: YES user_id=6380

    Thanks, Nick! Good job.

    Checked in, just in time for 2.2b1. I'm passing this tracker entry on to Fred for documentation. (Fred, feel free to pester Nick for docs. Nick, feel free to upload approximate patches to Doc/libos.tex and Doc/libtime.tex. :-)

    a8466748-c610-4c39-acea-986fbed22989 commented 22 years ago

    Logged In: YES user_id=499

    Here's a documentation patch for libos.tex. I don't know the TeX macros well enough to write an analogous one for libtime.tex; fortunately, it should be fairly easy to extrapolate from the included patch.

    freddrake commented 22 years ago

    Logged In: YES user_id=3066

    This has been checked in, edited, and checked in again.

    mwhudson commented 22 years ago

    Logged In: YES user_id=6656

    I know this patch is closed, but it seems a vaguely sane place to ask the question: why do we vary the number of field of os.stat_result across platforms? Wouldn't it be better to let it always have the same values & fill in one's that don't exists locally with -1 or something?

    It's hard to pickle os.stat_results portably the way things are at the moment...

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 22 years ago

    Logged In: YES user_id=21627

    Adding all fields is both difficult and undesirable. It is difficult because you may not know in advance what fields will be added in future versions, and it is undesirable because applications may think that there is a value even though the is none.

    What problem does that cause for pickling, and why would a complete list of all attributes solve this problem?

    mwhudson commented 22 years ago

    Logged In: YES user_id=6656

    I'm not worried about cross version problems.

    The problem with pickling is that stat_results (as of today) get pickled as "os.stat_result" and a tuple of arguments. The number of arguments os.stat_result takes varies by platform (it seems to be 10 on this NT box, but it's 13 on the starship, f'ex). So if a stat_result gets pickled on the starship and shoved down a socket to an NT machine, it can't be unpickled. I don't know if this sort of thing ever happens, but I could see it being surprising & annoying if I ran into it.

    If os.stat_result took 13 arguments everywhere, this problem obviously wouldn't arise.

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 22 years ago

    Logged In: YES user_id=21627

    To support pickling, I think structseq objects should implement a __reduce__ method, returning the type and a dictionary. The type's tp_new should accept dictionaries, and reconstruct the instance from the dictionary.

    Alternatively, copy_reg could grow support for stat_result, which seems desirable anyway, since os.stat returns a 'nt.stat_result' instance on Windows.

    Furthermore, fixing the number of arguments does not help at all in pickling; __reduce__ will return an argument tuple which includes the original object; in turn, pickle will recurse until the stack overflows.

    mwhudson commented 22 years ago

    Logged In: YES user_id=6656

    Martin, I may not have been 100% clear in my last note, but please run

    cvs up Objects/structseq.c

    structseq objects *do* now implement a __reduce__ method, but it returns a tuple. Using a dictionary would be more complicated, and not solve the issue completely: what happens when you go from a platform with less fields to one with more? What value does the not-prepared-for field have?

    Hmm, the point about nt.stat_result is a good one.

    Getting support into copy_reg.py leads to interesting bootstrapping problems when using uninstalled builds, unfortunately (site.py imports distutils imports re imports copy_reg; try to import, say, time, and you can't, because the whole reason to import distutils was to set up the path to find dynamically linked libraries...).

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 22 years ago

    Logged In: YES user_id=21627

    I'd not put the copyreg support into copy_reg, but into os.py. Pickling would save a reference to os._load_stat_result (or some such). When pickle tries to restore the value, it would first restore os.load_stat_result. For that, it would import os, which would register the copy_reg support.

    As for constructing structseq objects from dictionaries: it would be a ValueError if fields within [:n_sequence_fields] are not filled out; leaving out other fields is fine.