python / cpython

The Python programming language
https://www.python.org
Other
61.98k stars 29.8k forks source link

Docstrings for namedtuple #60873

Closed serhiy-storchaka closed 9 years ago

serhiy-storchaka commented 11 years ago
BPO 16669
Nosy @gvanrossum, @rhettinger, @terryjreedy, @giampaolo, @nedbat, @ericsnowcurrently, @serhiy-storchaka, @phmc
Files
  • namedtuple_docstrings_field_docs.patch: Use doc and field_docs arguments
  • namedtuple_docstrings_tuples_seq.patch: Use doc argument and field_names as a sequence of tuples
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/rhettinger' closed_at = created_at = labels = ['type-feature', 'library'] title = 'Docstrings for namedtuple' updated_at = user = 'https://github.com/serhiy-storchaka' ``` bugs.python.org fields: ```python activity = actor = 'rhettinger' assignee = 'rhettinger' closed = True closed_date = closer = 'rhettinger' components = ['Library (Lib)'] creation = creator = 'serhiy.storchaka' dependencies = [] files = ['28294', '28295'] hgrepos = [] issue_num = 16669 keywords = ['patch'] message_count = 21.0 messages = ['177381', '177393', '177418', '177434', '177470', '177560', '177577', '177592', '205249', '205269', '205271', '205277', '205317', '205340', '205341', '205582', '205583', '205978', '242096', '242106', '242121'] nosy_count = 10.0 nosy_names = ['gvanrossum', 'rhettinger', 'terry.reedy', 'peter.otten', 'giampaolo.rodola', 'nedbat', 'eric.snow', 'serhiy.storchaka', 'pconnell', 'Ankur.Ankan'] pr_nums = [] priority = 'low' resolution = 'fixed' stage = None status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue16669' versions = ['Python 3.5'] ```

    serhiy-storchaka commented 11 years ago

    Here are two patches which implementation two different interface for same feature.

    In first patch you can use *doc and *field_docs arguments to specify namedtuple class docstring and field docstrings. For example:

        Point = namedtuple('Point', 'x y',
                           doc='Point: 2-dimensional coordinate',
                           field_docs=['abscissa', 'ordinate'])

    In second patch you can use *doc argument to specify namedtuple class docstring and *field_names argument can be a sequence of pairs: field name and field docstring. For example:

        Point = namedtuple('Point', [('x', 'absciss'), ('y', 'ordinate')],
                           doc='Point: 2-dimensional coordinate')

    What approach is better?

    Feel free to correct a documentation. I know that it need a correction.

    rhettinger commented 11 years ago

    I don't think it is worth complicating the API for this. There have been zero requests for this functionality. Even the doc field of property() is rarely used.

    ericsnowcurrently commented 11 years ago

    What is wrong with the following?

    class Point(namedtuple('Point', 'x y')):
        """A 2-dimensional coordinate
    x - the abscissa
    y - the ordinate
    
    """

    This seems more clear to me. namedtuple is in some ways a quick-and-dirty type, essentially a more true implementation of the intended purpose of tuple. The temptation is to keep adding on functionality but we should resist until there is too much imperative. I don't see it here. While I don't have a gauge of how often people use (or would use) docstrings with nametuple, I expect that it's relatively low given the intended simplicity of namedtuple.

    serhiy-storchaka commented 11 years ago

    Yes, we can use inheritance trick/idiom to specify a class docstring. But there are no way to specify attribute docstrings.

    I encountered this when rewriting some C implemented code to Python. PyStructSequence allows you to specify docstrings for a class and attributes, but namedtuple does not.

    giampaolo commented 11 years ago

    I don't think it is worth complicating the API for this. There have been zero requests for this functionality. Even the doc field of property() is rarely used.

    +1

    terryjreedy commented 11 years ago

    I think this should be rejected and closed since the 'enhancement' looks worse to me than what we can do now.

    1. Most data attributes cannot have individual docstrings, so I expect the class docstring to list and possibly explain the data attributes.

    2. In the process of responding to bpo-16670, I finally read the namedtuple doc. I notice that it already generates default one-line .__doc__ attributes for both the class and properties. For Point, the class docstring is 'Point(x, y)', which will often be good enough.

    3. If the person creating the class does not think this sufficient, the replacement is likely to be multiple lines. This is awkward for a constructor argument. There is a reason we put docstrings *after the header, not *in the header.

    4. The class docstring is easily replaced by assignment. So I would write Eric's example as

    Point = namedtuple('Point', 'x y')
    Point.__doc__ = '''\
    A 2-dimensional coordinate

    x - the abscissa y - the ordinate'''

    This does not create a second new class and is not a 'trick'.

    1. The property docstrings have the form 'Alias for field number 0'. I do not consider replacing them an issue. If a true data attribute is replaced by a property, the act of replacement should be transparent. That is the point of properties. So there is no expectation that the attribute should suddenly grow a docstring, I presume that is why property docstrings are not used much. The default for named tuples gives information that is peculiarly relevant to named tuples and that should be generated automatically. As I said before, I think the prose explanation of field names belongs in the class doc.
    ericsnowcurrently commented 11 years ago

    +1, Terry

    serhiy-storchaka commented 11 years ago
    1. Most data attributes cannot have individual docstrings, so I expect the class docstring to list and possibly explain the data attributes.

    But almost all PyStructSequence field have individual docstrings.

    This does not create a second new class and is not a 'trick'.

    Thanks for the tip.

    I presume that is why property docstrings are not used much.

    Indeed, only 84 of 336 Python implemented properties have docstrings . However this is even larger percent than for methods (about 8K of 43K). And 100 of 115 PyStructSequence field have docstrings.

    I think Python should have more docstrings, not less.

    gvanrossum commented 10 years ago

    I don't know if it's worth reopening this, but I had a need for generating docs including attribute docstrings for a namedtuple class using Sphinx, and I noticed a few things...

    (1) Regarding there not being demand: There's a StackOverflow question for this with 17 "ups" on the question and 22 on the best answer: http://stackoverflow.com/questions/1606436/adding-docstrings-to-namedtuples-in-python

    (2) The default autodocs produced by sphinx look dreadful (e.g. https://www.dropbox.com/s/nakxsslhb588tu1/Screenshot%202013-12-04%2013.29.13.png) -- note the duplication of the class name, the line break before the signature, and the listing of attributes in alphabetical order with useless boilerplate. Here's what I would *like* to produce: (though there's probably too much whitespace :-): https://www.dropbox.com/s/j11uismbeo6rrzx/Screenshot%202013-12-04%2013.31.44.png

    (3) In Python 2.7 you can't assign to the __doc__ class attribute.

    I would really appreciate some way to set the docstring for the class as a whole as well as for each property, so they come out correct in Sphinx (and help()), preferably without having to manually assign doc strings or write the class by hand without using namedtuple at all. (The latter will become very verbose, each property has to look like this:

        @property
        def handle(self):
            """The datastore handle (a string)."""
            return self[1]
    )
    terryjreedy commented 10 years ago

    Serhiy: I am not familiar with C PyStructSequence and how an instance of one appears in Python code. I agree that more methods should have docstrings.

    Guido:

    1. I posted on SO the simple Py 3 solution that replaces the previously posted wrapper solutions needed for Py 2.

    2. Much of what you do not like is standard Sphinx/help behavior that would be unchanged by Serhiy's patch. The first line for a class is always "class \<classname>(\<baseclasses>)". The first line is followed by the docstring, so the class name is repeated if and only if it is repeated in the docstring (as for list, see below). The \_new/init__ signature is given here if and only it is in the docstring. Otherwise, one has to look down for the method. The method signatures are never on the first line. Examples:

    >>> help(list)
    Help on class list in module builtins:
    class list(object)
     |  list() -> new empty list
     |  list(iterable) -> new list initialized from iterable's items
    ...
    >>> class C:
            "doc string"
        def __init__(self, a, b): pass
    >>> help(C)
    Help on class C in module __main__:
    class C(builtins.object)
     |  doc string
     |
     |  Methods defined here:
     |  
     |  __init__(self, a, b)
    ...
    1. ?? Python 3 has many improvements and we will add more. ---

    I am still of the opinion that property usage should be a mostly transparent implementation detail. Point classes could have 4 instance attributes: x, y, r, and theta, with a particular implementation using 0 to 4 properties. All attributes should be documented regardless of the number of properties, which currently means listing them in the class docstring. A library could have more than one than one implementation.

    As for named tuples, I believe (without trying) that the name to index mapping could be done with __gettattr__ and a separate dict. If so, there would be no property docstrings and hence no field docstrings to worry about ;-). ---

    There have been requests for data attribute docstrings (without the bother and inefficiency of replacing a simple attribute with a property). Since such a docstring would have to be attached to the fixed attribute name, rather than the variable attribute value, I believe a string subclass would suffice, to be used as needed. The main problem is a decent syntax to add a docstring to a simple (assignment) statement.

    If the general problem were solved, I would choose Serhiy's option B for namedtuple.

    gvanrossum commented 10 years ago

    On Wed, Dec 4, 2013 at 5:40 PM, Terry J. Reedy \report@bugs.python.org\ wrote:

    1. I posted on SO the simple Py 3 solution that replaces the previously posted wrapper solutions needed for Py 2.

    Thanks, that will give people some pointers for Python 3. We need folks to upvote it. :-)

    1. Much of what you do not like is standard Sphinx/help behavior that would be unchanged by Serhiy's patch. The first line for a class is always "class \<classname>(\<base_classes>)".

    Maybe for help(), but the Sphinx docs look better for most classes. Compare my screen capture with the first class on this page: https://www.dropbox.com/static/developers/dropbox-python-sdk-1.6-docs/index.html The screen capture looks roughly like this (note this is two lines and the word DatastoreInfo is repeated -- that wasn't line folding):

    class dropbox.datastore.DatastoreInfo DatastoreInfo(id, handle, rev, title, mtime)

    whereas for non-namedtuple classes it looks like this:

    class dropbox.client.DropboxClient(oauth2_access_token, locale=None, rest_client=None)¶

    I understand that part of this is due to the latter class having an __init__ with a reasonable docstring, but the fact remains that namedtuple's default docstring produces poorly-looking documentation.

    The first line is followed by the docstring, so the class name is repeated if and only if it is repeated in the docstring (as for list, see below). The __new/init__ signature is given here if and only it is in the docstring. Otherwise, one has to look down for the method. The method signatures are never on the first line. Examples:

    >>> help(list) Help on class list in module builtins:

    class list(object) | list() -> new empty list | list(iterable) -> new list initialized from iterable's items ... >>> class C: "doc string" def __init__(self, a, b): pass

    >>> help(C) Help on class C in module __main__:

    class C(builtins.object) doc string
    Methods defined here:
    __init__(self, a, b)

    ...

    Yeah, help() is different than Sphinx. (As a general remark I find the help() output way too verbose with its endless listing of all the built-in behaviors.)

    1. ?? Python 3 has many improvements and we will add more. ---

    I am still of the opinion that property usage should be a mostly transparent implementation detail.

    What does that mean?

    Point classes could have 4 instance attributes: x, y, r, and theta, with a particular implementation using 0 to 4 properties. All attributes should be documented regardless of the number of properties, which currently means listing them in the class docstring. A library could have more than one than one implementation.

    For various reasons (like consistency with other classes) I *really* want the property docstrings on the individual properties, not in the class docstring. Here's a screenshot of what I want:

    https://www.dropbox.com/s/70zfapz8pcz9476/Screenshot%202013-12-04%2019.57.36.png

    I obtained this by abandoning the namedtuple and hand-coding properties -- the resulting class uses 4 lines (+ 1 blank) of boilerplate per property instead of just one line of docstring per property.

    As for named tuples, I believe (without trying) that the name to index mapping could be done with __gettattr__ and a separate dict. If so, there would be no property docstrings and hence no field docstrings to worry about ;-).

    I'm not sure what you are proposing here -- a patch to namedtuple or a work-around? I think namedtuple is too valuable to abandon. It not only saves a lot of code, it captures the regularity of the code. (If I have a class with 5 similar-looking methods it's easy to overlook a subtle difference in one of them.)

    ---

    There have been requests for data attribute docstrings (without the bother and inefficiency of replacing a simple attribute with a property). Since such a docstring would have to be attached to the fixed attribute name, rather than the variable attribute value, I believe a string subclass would suffice, to be used as needed. The main problem is a decent syntax to add a docstring to a simple (assignment) statement.

    Sphinx actually has a syntax for this already. In fact, it has three: it allwos a comment before or on the class variable starting with "#:", or a docstring immediately following. Check out this documentation for the autodoc extension: http://sphinx-doc.org/ext/autodoc.html#directive-autoattribute

    If the general problem were solved, I would choose Serhiy's option B for namedtuple.

    If you're referring to this:

        Point = namedtuple('Point', [('x', 'absciss'), ('y', 'ordinate')],
                           doc='Point: 2-dimensional coordinate')

    I'd love it!

    terryjreedy commented 10 years ago

    I find the help() output way too verbose with its endless listing of all the built-in behaviors.)

    Then you might agree to a patch, on a separate issue. Let's set help aside for the moment.

    I am familiar with running Sphinx on .rst files, but not on docstrings. It looks like the docstrings use .rst markup. (Is this allowed in the stdlib?) (The output looks good enough for a first draft of a tkinter class/method reference, which I would like to work on.)

    I understand that part of this [signature after class name] is due to the latter class having an __init__ with a reasonable docstring

    If dropbox.client is written in Python, as I presume, then I strongly suspect that the signature part of class dropbox.client.DropboxClient( oauth2_access_token, locale=None, restclient=None) comes from an inspect module method that examines the function attributes other than .\_doc. If so, DropboxClient.__init docstring is irrelevant to the above. You could test by commenting it out and rerunning the doc build.

    The inspect methods do not work on C-coded functions (unless Argument Clinic has fixed this for 3.4), which is why signatures are put in the docstrings for C-coded objects. For C-coded classes, it is put in the class docstring rather than the class.__init__ docstring.

    but the fact remains that namedtuple's default docstring produces poorly-looking documentation.

    'x.__init__(...) initializes x; see help(type(x)) for signature'

    This is standard boilerplate for C-coded .__init.__doc. Raymond just copied it.

    >>> int.__init__.__doc__
    'x.__init__(...) initializes x; see help(type(x)) for signature'
    >>> list.__init__.__doc__
    'x.__init__(...) initializes x; see help(type(x)) for signature'

    I will try to explain 'property transparency/equivalence' in another post, when I am fresher, and after reading the autodoc reference, so you can understand enough to agree or not. My reference to a possible alternate implementation of named tuple was part of the failed explanation of 'property transparency'. I am not proposing a change now.

    gvanrossum commented 10 years ago

    On Wed, Dec 4, 2013 at 10:25 PM, Terry J. Reedy \report@bugs.python.org\ wrote:

    I am familiar with running Sphinx on .rst files, but not on docstrings. It looks like the docstrings use .rst markup. (Is this allowed in the stdlib?)

    I'm not sure if it is allowed, but it is certainly used plenty in some modules (perhaps those that started life as 3rd party packages).

    (The output looks good enough for a first draft of a tkinter class/method reference, which I would like to work on.)

    I won't stop you -- having *any* kind of docs for Tkinter sounds good to me!

    > I understand that part of this [signature after class name] is due to the latter class having an __init__ with a reasonable docstring

    If dropbox.client is written in Python, as I presume,

    It is.

    then I strongly suspect that the signature part of class dropbox.client.DropboxClient( oauth2_access_token, locale=None, restclient=None) comes from an inspect module method that examines the function attributes other than .\_doc__.

    Indeed.

    If so, DropboxClient.__init__ docstring is irrelevant to the above. You could test by commenting it out and rerunning the doc build.

    Yes.

    The inspect methods do not work on C-coded functions (unless Argument Clinic has fixed this for 3.4), which is why signatures are put in the docstrings for C-coded objects. For C-coded classes, it is put in the class docstring rather than the class.__init__ docstring.

    Perhaps it doesn't understand __new? namedtuple actually generates Python code for a class definition using a template and then uses exec() on the filled-in template; the template defines only __new though.

    > but the fact remains that namedtuple's default docstring produces poorly-looking documentation.

    'x.__init__(...) initializes x; see help(type(x)) for signature'

    This is standard boilerplate for C-coded .__init.__doc. Raymond just copied it.

    He didn't (it's not in the template). It is the dummy __init that tuple inherits from object (the docstring is in the __init wrapper in typeobject.c).

    >>> int.__init.__doc 'x.__init(...) initializes x; see help(type(x)) for signature' >>> list.__init.__doc 'x.__init(...) initializes x; see help(type(x)) for signature'

    terryjreedy commented 10 years ago

    I think we can now agree that docstrings other than the class docstring (used as a fallback) are not relevant to signature detection. And Raymond gave namedtuple classes the docstring needed as a fallback.

    We are off-issue here, but idlelib.CallTips.getargspec() is also ignorant that it may need to look at .\_new. An object with a C-coded .__init and Python-coded .__new__ is new to new-style classes. The new inspect.signature function handles such properly. Starting with a namedtuple Point (without the default docstring):

    >>> from inspect import signature
    >>> str(signature(Point.__new__))
    '(_cls, x, y)'
    >>> str(signature(Point))
    '(x, y)'

    The second is what autodoc should use. I just opened bpo-19903 to update Idle to use signature.

    gvanrossum commented 10 years ago

    It was never about signature detection for me -- what gave you that idea? I simply want to have the option to put individual docstrings on the properties generated by namedtuple.

    nedbat commented 10 years ago

    I'll add my voice to those asking for a way to put docstrings on namedtuples. As it is, namedtuples get automatic docstrings that seem to me to be almost worse than none. Sphinx produces this:

    class Key
    
        Key(scope, user_id, block_scope_id, field_name)
    
        __getnewargs__()
    
            Return self as a plain tuple. Used by copy and pickle.
    
        __repr__()
    
            Return a nicely formatted representation string
    
        block_scope_id None
    
            Alias for field number 2
    
        field_name None
    
            Alias for field number 3
    
        scope None
    
            Alias for field number 0
    
        user_id None
    
            Alias for field number 1

    Why are __getnewargs__ and __repr__ included at all, they aren't useful for API documentation. The individual property docstrings offer no new information over the summary at the top. I'd like namedtuple not to be so verbose where it has no useful information to offer. The one-line summary is all the information namedtuple has, so that is all it should include in the docstring:

    class Key
    
        Key(scope, user_id, block_scope_id, field_name)
    serhiy-storchaka commented 10 years ago

    Unhide this discussion.

    rhettinger commented 10 years ago

    A few quick thoughts:

    rhettinger commented 9 years ago

    FWIW, here's a proposed new classmethod that makes it possible to easily customize the field docstrings but without cluttering the API of the factory function:

        @classmethod
        def _set_docstrings(cls, **docstrings):
            '''Customize the field docstrings
               >>> Point = namedtuple('Point', ['x', 'y'])
               >>> Point._set_docstrings(x = 'abscissa', y = 'ordinate')
    
            '''
            for fieldname, docstring in docstrings.items():
                if fieldname not in cls._fields:
                    raise ValueError('Fieldname %r does not exist' % fieldname)
                new_property = _property(getattr(cls, fieldname), doc=docstring)
                setattr(cls, fieldname, new_property)

    Note, nothing is needed for the main docstring since it is already writeable:

         Point.__doc__ = '2-D Coordinate'
    f7385517-0a24-49a2-83fd-f4eca87773fa commented 9 years ago

    Here's a variant that builds on your code, but makes for a nicer API. Single-line docstrings can be passed along with the attribute name, and with namedtuple.with_docstrings(... all info required to build the class ...) from a user perspective the factory looks like a class method:

    from functools import partial
    from collections import namedtuple
    
    def _with_docstrings(cls, typename, field_names_with_doc,
                         *, verbose=False, rename=False, doc=None):
        field_names = []
        field_docs = []
        if isinstance(field_names_with_doc, str):
            field_names_with_doc = [
                line for line in field_names_with_doc.splitlines() if line.strip()]
        for item in field_names_with_doc:
            if isinstance(item, str):
                item = item.split(None, 1)
            if len(item) == 1:
                [fieldname] = item
                fielddoc = None
            else:
                fieldname, fielddoc = item
            field_names.append(fieldname)
            field_docs.append(fielddoc)
    
        nt = cls(typename, field_names, verbose=verbose, rename=rename)
    
        for fieldname, fielddoc in zip(field_names, field_docs):
            if fielddoc is not None:
                new_property = property(getattr(nt, fieldname), doc=fielddoc)
                setattr(nt, fieldname, new_property)
    
        if doc is not None:
            nt.__doc__ = doc
        return nt
    
    namedtuple.with_docstrings = partial(_with_docstrings, namedtuple)
    
    if __name__ == "__main__":
        Point = namedtuple.with_docstrings("Point", "x abscissa\ny ordinate")
        Address = namedtuple.with_docstrings(
            "Address",
            """
            name Surname
            first_name First name
        city
        email Email address
        """)
    Whatever = namedtuple.with_docstrings(
        "Whatever",
        [("foo", "doc for\n foo"),
         ("bar", "doc for bar"),
         "baz"],
        doc="""The Whatever class.

    Example for a namedtuple with multiline docstrings for its attributes.""")

    rhettinger commented 9 years ago

    The need for this may be eliminated by bpo-24064. Then we change the docstrings just like any other object with no special rules or methods.