Closed a8466748-c610-4c39-acea-986fbed22989 closed 22 years ago
See bug bpo-111481, and PEP-0042. Both suggest that the return values for os.{stat,lstat,statvfs,fstatvfs} ought to be struct-like objects rather than simple tuples.
With this patch, the os module will modify the aformentioned functions so that their results still obey the previous tuple protocol, but now have read-only attributes as well. In other words, "os.stat('filename')[0]" is now synonymous with "os.stat('filename').st_mode.
The patch also modifies test_os.py to test the new behavior.
In order to prevent old code from breaking, these new return types extend tuple. They also use the new attribute descriptor interface. (Thanks for PEP-025[23], Guido!)
Backward compatibility: Code will only break if it assumes that type(os.stat(...)) == TupleType, or if it assumes that os.stat(...) has no attributes beyond those defined in tuple.
Logged In: YES user_id=499
BTW, if this gets in, I have another patch that adds support for st_blksize, st_blocks, and st_rdev on platforms that support them. It don't expose these new fields in the tuple, as that would break all the old code that tries to unpack all the fields of the tuple. Instead, these fields are only accessible as attributes.
Logged In: YES user_id=21627
I second the request for supporting additional fields where available. At the same time, it appears unimplementable using pure Python.
Consequently, I'd like to see this patch redone in C. The implementation strategy could probably remain the same, i.e. inherit from tuple for best compatibility; add the remaining fields as slots. It may be reasonable to implement attribute access using a custom getattr function, though.
I have also my doubts about the naming of the fields. The st_ prefix originates from the time where struct fields were living in the global namespace (i.e. across different structures), so prefixing them for uniqueness was essential. I'm not sure whether we should inherit this into Python...
Logged In: YES user_id=499
Martin: I'm not entirely sure what you mean here; while my patch for extra fields requires a minor chunk of C (to access the struct fields), the rest still works in pure python. I'm attaching this second version for reference.
I'm not sure it makes much sense to do this with pure C; it would certainly take a lot more code, with little benefit I can descern. But you're more experienced than I; what am I missing?
I agree that the field naming is suboptimal; I was taking my lead from the stat and statvfs modules. If people prefer, we can name the fields whatever we like.
Logged In: YES user_id=499
On further consideration, the approach taken in the second (example only) patch is indeed too fragile. The C code should not lengthen the tuple arbitrarily and depend on the Python code to decode it; instead, it should return a dictionary of extra fields. I think that this approach uses a minimum of C, is easily maintainable, and very extensible.
Logged In: YES user_id=499
Here's the revised (example only) patch that takes the more portable approach I mention below.
Logged In: YES user_id=6380
Haven't had time to review the patch yet, but the idea of providing a structure with fields that doubles as a tuple is a good one. It's been tried before and can be done in pure Python as well.
Regarding the field names: I think the field names should keep their st_ prefix -- IMO this makes the code more recognizable and hence readable.
Logged In: YES user_id=21627
The problem with your second and third patch is that it includes an incompatibility for users of posix.stat (and friends), since it changes the siye of the tuple. If you want to continue to return a tuple (as the top-level data structure), you'll break compatibility for applications using the C module directly. An example of code that would be broken is
mode, ino, dev, nlink, uid, gid, size, a, c, m = posix.stat(filename)
To pass the additional fields, you already need your class _StatResult available in C. You may find a way to define it in Python and use it in C, but that has proven to be very fragile in the past.
Logged In: YES user_id=499
Ah! Now I see. I hadn't realized that anybody used the posix module directly. (People really do this?)
I'll try to write up a patch in C tonight or tomorrow morning. A couple of questions on which I could use advice: (1) Where is the proper place to put this kind of tuple-with-fields hybrid? Modules? Objects? In a new file or an existing one? (2) Should I try to make it general enough for non-stat use?
Logged In: YES user_id=21627
Using posix.stat is common, see
http://groups.yahoo.com/group/python-list/message/4349 http://www.washington.edu/computing/training/125/mkdoc.html http://groups.google.com/groups?th=7d7d118fed161e0&seekm=5qdjch%24dci%40nntp6.u.washington.edu
for examples. None of these would break with your change, though, since they don't rely on the lenght of the tuple.
If you are going to implement the type in C, I'd put it in the posix module. If you are going to implement it in Python (and only use it from the Posix module), making it general-purpose may be desirable. However, a number of things would need to be considered, so a PEP might be appropriate. If that is done, I'd propose an interface like
tuple_with_attrs((value-tuple), (tuple-of-field-names), exposed-length-of-tuple))
Logged In: YES user_id=499
I'm becoming more and more convinced that doing it in C is the right thing, but I have issue with doing it in the posix module. The stat function is provided on (nearly?) all platforms, and doing it in C will require minor changes to all of these modules. We can probably live with this, but I don't think we should duplicate code between all of the os modules.
Is there some other appropriate place to put it in C?
Logged In: YES user_id=21627
There aren't actually so many copies of the module, since posixmodule implements "posix","nt", and "os2". I found alternative implementations in riscosmodule and macmodule.
Still, putting the support type into a shared C file is appropriate. I can think of two candidate places: tupleobject.c and fileobject.c. It may be actually worthwhile attempting to share the stat() implementations as well, but that could be an add-on.
Logged In: YES user_id=6380
Or you could put it in modsupport.c, which is already a grab-bag of handy stuff.
Logged In: YES user_id=499
Well, here's a posixmodule-only, all-C version. If this seems like a good approach, I'll add some better docstrings, move it into whichever module you like, and make riscosmodule.c and macmodule.c use it too.
Logged In: YES user_id=499
And here's an even better all-C version. (This one doesn't use a dictionary to store optional attributes.)
Logged In: YES user_id=21627
The patch looks good to me. Are you willing to revise it one more time to cover all the stat implementations? A few comments on the implementation:
Logged In: YES user_id=499
I've fixed it with the suggestions you made, and also 1) Added docstrings 2) Fixed a nasty segfault bug that would be triggered by os.stat("/foo").__class__((10,)).st_size and added tests to keep it from reappearing.
I'm not sure I know how to cover Mac and RISCOS properly: riscos.stat returns a 13-element tuple, and is hence already incompatible with posix.stat; whereas mac.{stat|xstat} return differing types.
If somebody with experience with these modules could let give me guidance as to the Right Thing, I'll be happy to give it a shot... but my shot isn't likely to be half as good as somebody who knew the modules better. (For example, I don't have the facilities to compile macmodule or riscmodule at all, much less test them.)
I'd also be glad to make any changes that would help maintainers of those modules.
Logged In: YES user_id=6380
One comment on the patch: beautiful use of the new type stuff, but there's something funky with the constructors going on. It seems that the built-in __new (inherited from the tuple class) requires exactly one argument -- a sequence to be tuplified -- but your __init requires 13 arguments. So construction by using posix.stat_result(...) always fails. It makes more sense to fix the init routine to require a 13-tuple as argument. I would also recommend overriding the tp_new slot to require a 13-tuple: right now, I can cause an easy core dump as follows:
>>> import os
>>> a = os.stat_result.__new__(os.stat_result, ())
>>> a.st_ctime
Segmentation fault (core dumped)
$
Logged In: YES user_id=6380
Another comment: we should move this to its own file so that other os.stat() implementations (esp. MacOS, maybe RiscOS) that aren't in posixmodule.c can also use it, rather than having to maintain three separate versions of the code.
Logged In: YES user_id=499
I think this might be the one... or at least, the next-to-last-one. This version of the patch:
(1) moves the shared C code into a new module, "_stat", for internal use. (2) updates macmodule and riscosmodule to use the new code. (3) fixes a significant reference leak in previous versions. (4) is immune to the __new and __init bugs in previous versions.
Things to note: (A) I've tried to make sure that my Mac/RISCOS code was correct, but I don't have any way to compile or test it. (B) I'm not sure my use of PyImport_ImportModule is legit. (C) I've allowed users to construct instances of stat_result with \< or > 13 arguments. When this happens, attempts to get nonexistant attributes now raise AttributeError. (D) When dealing with Mac.xstat and RISCOS.stat, I chose to keep backward compatibility rather than enforcing the 10-tuple rule in the docs.
Because there are new files, I can't make 'cvs diff' get everything. I'm uploading a zip file that contains _statmodule.c, _statmodule.h, and a unified diff. Please let me know if you'd prefer a different format.
Logged In: YES user_id=6380
Nick, what's your real email? I have a bunch of feedback related to your use of the new type stuff -- this is uncharted territory for me too, and this SF box is too small to type comfortably.
Logged In: YES user_id=499
I've sent my email address to 'guido at python.org'. For reference, it's 'nickm at alum.mit.edu'.
Logged In: YES user_id=499
The fifth all-C (!) version, with changes as suggested by Guido's comments via email.
Big changes: This version no longer subclasses tuple. Instead, it creates a general-purpose mechanism for making struct/sequence hybrids in C.
It now includes a patch for timemodule.c as well. Shortcomings: (1) As before, macmodule and riscosmodule aren't tested. (2) These new classes don't participate in GC and aren't subclassable. (Famous last words: "I don't think this will matter." :) ) (3) This isn't a brand-new metaclass; it's just a quick bit of C. As such, you can't use this mechanism to create new struct/tuple hybrids from Python. (I claim this isn't a drawback, since it's way easier to reimplement this in python than it is to make it accessible from python.)
So, how's *this* one?
Logged In: YES user_id=6656
If this goes in, I'd like to see it used for termios.tc {get,set}attr too.
I could probably implement this (but not *right* now...).
Logged In: YES user_id=6380
Patience, please. I'm behind reviewing this, probably won't have time today either.
Logged In: YES user_id=6380
I'm looking at this now.
Logged In: YES user_id=6380
Thanks, Nick! Good job.
Checked in, just in time for 2.2b1. I'm passing this tracker entry on to Fred for documentation. (Fred, feel free to pester Nick for docs. Nick, feel free to upload approximate patches to Doc/libos.tex and Doc/libtime.tex. :-)
Logged In: YES user_id=499
Here's a documentation patch for libos.tex. I don't know the TeX macros well enough to write an analogous one for libtime.tex; fortunately, it should be fairly easy to extrapolate from the included patch.
Logged In: YES user_id=3066
This has been checked in, edited, and checked in again.
Logged In: YES user_id=6656
I know this patch is closed, but it seems a vaguely sane place to ask the question: why do we vary the number of field of os.stat_result across platforms? Wouldn't it be better to let it always have the same values & fill in one's that don't exists locally with -1 or something?
It's hard to pickle os.stat_results portably the way things are at the moment...
Logged In: YES user_id=21627
Adding all fields is both difficult and undesirable. It is difficult because you may not know in advance what fields will be added in future versions, and it is undesirable because applications may think that there is a value even though the is none.
What problem does that cause for pickling, and why would a complete list of all attributes solve this problem?
Logged In: YES user_id=6656
I'm not worried about cross version problems.
The problem with pickling is that stat_results (as of today) get pickled as "os.stat_result" and a tuple of arguments. The number of arguments os.stat_result takes varies by platform (it seems to be 10 on this NT box, but it's 13 on the starship, f'ex). So if a stat_result gets pickled on the starship and shoved down a socket to an NT machine, it can't be unpickled. I don't know if this sort of thing ever happens, but I could see it being surprising & annoying if I ran into it.
If os.stat_result took 13 arguments everywhere, this problem obviously wouldn't arise.
Logged In: YES user_id=21627
To support pickling, I think structseq objects should implement a __reduce__ method, returning the type and a dictionary. The type's tp_new should accept dictionaries, and reconstruct the instance from the dictionary.
Alternatively, copy_reg could grow support for stat_result, which seems desirable anyway, since os.stat returns a 'nt.stat_result' instance on Windows.
Furthermore, fixing the number of arguments does not help at all in pickling; __reduce__ will return an argument tuple which includes the original object; in turn, pickle will recurse until the stack overflows.
Logged In: YES user_id=6656
Martin, I may not have been 100% clear in my last note, but please run
cvs up Objects/structseq.c
structseq objects *do* now implement a __reduce__ method, but it returns a tuple. Using a dictionary would be more complicated, and not solve the issue completely: what happens when you go from a platform with less fields to one with more? What value does the not-prepared-for field have?
Hmm, the point about nt.stat_result is a good one.
Getting support into copy_reg.py leads to interesting bootstrapping problems when using uninstalled builds, unfortunately (site.py imports distutils imports re imports copy_reg; try to import, say, time, and you can't, because the whole reason to import distutils was to set up the path to find dynamically linked libraries...).
Logged In: YES user_id=21627
I'd not put the copyreg support into copy_reg, but into os.py. Pickling would save a reference to os._load_stat_result (or some such). When pickle tries to restore the value, it would first restore os.load_stat_result. For that, it would import os, which would register the copy_reg support.
As for constructing structseq objects from dictionaries: it would be a ValueError if fields within [:n_sequence_fields] are not filled out; leaving out other fields is fine.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = 'https://github.com/freddrake' closed_at =
created_at =
labels = ['library']
title = 'Add attributes to os.stat results'
updated_at =
user = 'https://bugs.python.org/nickm'
```
bugs.python.org fields:
```python
activity =
actor = 'loewis'
assignee = 'fdrake'
closed = True
closed_date = None
closer = None
components = ['Library (Lib)']
creation =
creator = 'nickm'
dependencies = []
files = ['3622', '3623', '3624', '3625', '3626', '3627', '3628', '3629', '3630']
hgrepos = []
issue_num = 462296
keywords = ['patch']
message_count = 35.0
messages = ['37622', '37623', '37624', '37625', '37626', '37627', '37628', '37629', '37630', '37631', '37632', '37633', '37634', '37635', '37636', '37637', '37638', '37639', '37640', '37641', '37642', '37643', '37644', '37645', '37646', '37647', '37648', '37649', '37650', '37651', '37652', '37653', '37654', '37655', '37656']
nosy_count = 5.0
nosy_names = ['mwh', 'gvanrossum', 'loewis', 'fdrake', 'nickm']
pr_nums = []
priority = 'normal'
resolution = 'accepted'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue462296'
versions = []
```