readthedocs / sphinx-autoapi

A new approach to API documentation in Sphinx.
https://sphinx-autoapi.readthedocs.io/
MIT License
430 stars 125 forks source link

Sphinx AutoAPI >=3.2 generates duplicate definitions of class attributes #476

Open zaneselvans opened 1 month ago

zaneselvans commented 1 month ago

With the update to the conda-forge feedstock in #465 I've just upgraded from v3.0.0 to v3.2.1 of sphinx-autoapi, and am now getting errors like these when I run sphinx-build, which I never got before:

sphinx.errors.SphinxWarning: /Users/zane/code/catalyst/pudl/docs/autoapi/pudl/analysis/record_linkage/link_cross_year/index.rst:96:duplicate object description of pudl.analysis.record_linkage.link_cross_year.DistanceMatrix.distance_matrix, other instance in autoapi/pudl/analysis/record_linkage/link_cross_year/index, use :no-index: for one of them

or

sphinx.errors.SphinxWarning: /Users/zane/code/catalyst/pudl/docs/autoapi/pudl/analysis/timeseries_cleaning/index.rst:297:duplicate object description of pudl.analysis.timeseries_cleaning.Timeseries.xi, other instance in autoapi/pudl/analysis/timeseries_cleaning/index, use :no-index: for one of them

By installing previous versions of sphinx-autoapi in my environment using pip I see that the problem first appears in v3.2.0. In v3.1.2 the docs build fine.

This sounds vaguely like the error that was coming up in #452?

The RST generated by v3.2.1, including the class and its attributes:

.. py:class:: Timeseries(x: numpy.ndarray | pandas.DataFrame)

   Multivariate timeseries for anomalies detection and imputation.

   .. attribute:: xi

      Reference to the original values (can be null).
      Many methods assume that these represent chronological, regular timeseries.

   .. attribute:: x

      Copy of :attr:`xi` with any flagged values replaced with null.

   .. attribute:: flags

      Flag label for each value, or null if not flagged.

   .. attribute:: flagged

      Running list of flags that have been checked so far.

   .. attribute:: index

      Row index.

   .. attribute:: columns

      Column names.

   .. py:attribute:: xi
      :type:  numpy.ndarray

   .. py:attribute:: index
      :type:  pandas.Index

   .. py:attribute:: columns
      :type:  pandas.Index

   .. py:attribute:: x
      :type:  numpy.ndarray

   .. py:attribute:: flags
      :type:  numpy.ndarray

   .. py:attribute:: flagged
      :type:  list[str]
      :value: []

vs. by v3.1.2:

.. py:class:: Timeseries(x: numpy.ndarray | pandas.DataFrame)

   Multivariate timeseries for anomalies detection and imputation.

   .. attribute:: xi

      Reference to the original values (can be null).
      Many methods assume that these represent chronological, regular timeseries.

   .. attribute:: x

      Copy of :attr:`xi` with any flagged values replaced with null.

   .. attribute:: flags

      Flag label for each value, or null if not flagged.

   .. attribute:: flagged

      Running list of flags that have been checked so far.

   .. attribute:: index

      Row index.

   .. attribute:: columns

      Column names.

So it seems like AutoAPI is generating two versions of the attributes -- one with the py: prefix (which includes type information) and one without (which includes the attribute docstrings).

AWhetter commented 1 month ago

It should be possible to change AutoAPI to inspect the class docstring and omit the separate documenting of attributes if they're already covered in the class docstring.

zaneselvans commented 1 month ago

Hmm, I guess the attributes are kind of separately defined in the docstring and the body of the class. I wonder what changed in 3.2 that triggers this now. And I wonder what would happen if I added the types to the docstring. Maybe as a stopgap I'll attach docstrings to the attributes instead of having them in the class docstring.

class Timeseries:
    """Multivariate timeseries for anomalies detection and imputation.

    Attributes:
        xi: Reference to the original values (can be null).
            Many methods assume that these represent chronological, regular timeseries.
        x: Copy of :attr:`xi` with any flagged values replaced with null.
        flags: Flag label for each value, or null if not flagged.
        flagged: Running list of flags that have been checked so far.
        index: Row index.
        columns: Column names.
    """

    def __init__(self, x: np.ndarray | pd.DataFrame) -> None:
        """Initialize a multivariate timeseries.

        Args:
            x: Timeseries with shape (n observations, m variables).
                If :class:`pandas.DataFrame`, :attr:`index` and :attr:`columns`
                are equal to `x.index` and `x.columns`, respectively.
                Otherwise, :attr:`index` and :attr:`columns` are the default
                `pandas.RangeIndex`.
        """
        self.xi: np.ndarray
        self.index: pd.Index
        self.columns: pd.Index
        if isinstance(x, pd.DataFrame):
            self.xi = x.to_numpy()
            self.index = x.index
            self.columns = x.columns
        else:
            self.xi = x
            self.index = pd.RangeIndex(x.shape[0])
            self.columns = pd.RangeIndex(x.shape[1])
        self.x: np.ndarray = self.xi.copy()
        self.flags: np.ndarray = np.empty(self.x.shape, dtype=object)
        self.flagged: list[str] = []
AWhetter commented 2 weeks ago

AutoAPI used to ignore instance attributes that didn't have a docstring. That was counter to how every other member worked so it was changed in v3.2 such that instance attributes are documented using the normal rules.

Instance attributes are a bit of an oddball though because how they're documented varies between tools. AutoAPI and some other tools like autodoc accept a docstring for attributes by putting a docstring directly after the assignment in __init__. This is the preferred way of doing it because things like documenting inherited attributes works as expected. But Python itself doesn't recognise these types of docstrings and so it practically requires the documenting of attributes in the docstring. A downside with this is that if you don't document the attributes as the last thing in the docstring -- which is particularly problematic with autoapi_python_class_content and autodoc_class_content because it will append the __init__ docstring to the end of the class docstring -- then everything is rendered in an odd way where you get part of the class docstring rendered, then all of the attributes, then the remainder of the docstring, then every other member.

What you're aiming for with your Timeseries class though is certainly a valid use case. Ideally AutoAPI would append the type information into the already existing definitions of the attributes. Plus the AutoAPI docs could be doing a better job of explaining this behaviour, and informing users about the options available to them when it comes to documenting attributes.

zaneselvans commented 2 weeks ago

My stopgap solution of only documenting via docstrings on the attributes themselves seems to work, but documentation of this behavior would be helpful. I was very confused!