pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.25k stars 17.79k forks source link

Fix PR02 issues in docstrings #27976

Open datapythonista opened 5 years ago

datapythonista commented 5 years ago

Fix the docstrings where an unknown parameter is documented (likely that they are typos, errors in the parameter format...).

Current errors:

$ ./scripts/validate_docstrings.py --errors=PR02
pandas.Series.astype: Unknown parameters {kwargs}
pandas.Series.pipe: Unknown parameters {args, kwargs}
pandas.Series.clip: Unknown parameters {*args, **kwargs}
pandas.Series.cummax: Unknown parameters {*args, **kwargs :}
pandas.Series.cummin: Unknown parameters {*args, **kwargs :}
pandas.Series.cumprod: Unknown parameters {*args, **kwargs :}
pandas.Series.cumsum: Unknown parameters {*args, **kwargs :}
pandas.Series.mad: Unknown parameters {**kwargs, numeric_only}
pandas.Series.compound: Unknown parameters {**kwargs, numeric_only}
pandas.Series.drop: Unknown parameters {index, columns}
pandas.Series.idxmax: Unknown parameters {*args, **kwargs}
pandas.Series.idxmin: Unknown parameters {*args, **kwargs}
pandas.Series.reindex: Unknown parameters {level, limit, copy, fill_value, tolerance, method}
pandas.Series.rename: Unknown parameters {copy, inplace, level}
pandas.Series.rename_axis: Unknown parameters {mapper, copy, inplace, index, columns, axis}
pandas.Series.argmin: Unknown parameters {*args, **kwargs, skipna, axis}
pandas.Series.argmax: Unknown parameters {*args, **kwargs, skipna, axis}
pandas.Series.swaplevel: Unknown parameters {i, j}
pandas.Series.dt.to_period: Unknown parameters {freq}
pandas.Series.dt.tz_localize: Unknown parameters {tz, errors, ambiguous, nonexistent}
pandas.Series.dt.tz_convert: Unknown parameters {tz}
pandas.Series.dt.strftime: Unknown parameters {date_format}
pandas.Series.dt.round: Unknown parameters {nonexistent, ambiguous, freq}
pandas.Series.dt.floor: Unknown parameters {nonexistent, ambiguous, freq}
pandas.Series.dt.ceil: Unknown parameters {nonexistent, ambiguous, freq}
pandas.Series.dt.month_name: Unknown parameters {locale}
pandas.Series.dt.day_name: Unknown parameters {locale}
pandas.Series.str.cat: Unknown parameters {na_rep, join, others, sep}
pandas.Series.str.center: Unknown parameters {fillchar, width}
pandas.Series.str.contains: Unknown parameters {flags, regex, case, pat, na}
pandas.Series.str.count: Unknown parameters {flags, pat}
pandas.Series.str.encode: Unknown parameters {encoding, errors}
pandas.Series.str.endswith: Unknown parameters {pat, na}
pandas.Series.str.extract: Unknown parameters {expand, flags, pat}
pandas.Series.str.extractall: Unknown parameters {flags, pat}
pandas.Series.str.find: Unknown parameters {sub, start, end}
pandas.Series.str.findall: Unknown parameters {flags, pat}
pandas.Series.str.index: Unknown parameters {sub, start, end}
pandas.Series.str.join: Unknown parameters {sep}
pandas.Series.str.ljust: Unknown parameters {fillchar, width}
pandas.Series.str.lstrip: Unknown parameters {to_strip}
pandas.Series.str.match: Unknown parameters {flags, case, pat, na}
pandas.Series.str.normalize: Unknown parameters {form}
pandas.Series.str.pad: Unknown parameters {side, fillchar, width}
pandas.Series.str.partition: Unknown parameters {expand, pat, sep}
pandas.Series.str.repeat: Unknown parameters {repeats}
pandas.Series.str.replace: Unknown parameters {flags, n, repl, regex, case, pat}
pandas.Series.str.rfind: Unknown parameters {sub, start, end}
pandas.Series.str.rindex: Unknown parameters {sub, start, end}
pandas.Series.str.rjust: Unknown parameters {fillchar, width}
pandas.Series.str.rpartition: Unknown parameters {expand, pat, sep}
pandas.Series.str.rstrip: Unknown parameters {to_strip}
pandas.Series.str.slice_replace: Unknown parameters {stop, repl, start}
pandas.Series.str.split: Unknown parameters {expand, pat, n}
pandas.Series.str.rsplit: Unknown parameters {expand, pat, n}
pandas.Series.str.startswith: Unknown parameters {pat, na}
pandas.Series.str.strip: Unknown parameters {to_strip}
pandas.Series.str.translate: Unknown parameters {table}
pandas.Series.str.wrap: Unknown parameters {break_on_hyphens, replace_whitespace, width, drop_whitespace, break_long_words, expand_tabs}
pandas.Series.str.zfill: Unknown parameters {width}
pandas.Series.str.get_dummies: Unknown parameters {sep}
pandas.Series.cat.rename_categories: Unknown parameters {inplace, new_categories}
pandas.Series.cat.reorder_categories: Unknown parameters {inplace, ordered, new_categories}
pandas.Series.cat.add_categories: Unknown parameters {inplace, new_categories}
pandas.Series.cat.remove_categories: Unknown parameters {inplace, removals}
pandas.Series.cat.remove_unused_categories: Unknown parameters {inplace}
pandas.Series.cat.set_categories: Unknown parameters {rename, inplace, ordered, new_categories}
pandas.Series.cat.as_ordered: Unknown parameters {inplace}
pandas.Series.cat.as_unordered: Unknown parameters {inplace}
pandas.Series.plot: Unknown parameters {xticks, xlim, mark_right, yticks, table, legend, ylim, logx, fontsize, include_bool, colormap, title, logy, `**kwds`, kind, position, colorbar, yerr, y, x, style, rot, loglog, figsize, grid, use_index, xerr}
pandas.Series.plot.area: Unknown parameters {stacked, **kwds}
pandas.Series.plot.bar: Unknown parameters {**kwds}
pandas.Series.plot.barh: Unknown parameters {**kwds}
pandas.Series.plot.box: Unknown parameters {**kwds}
pandas.Series.plot.density: Unknown parameters {**kwds}
pandas.Series.plot.hist: Unknown parameters {**kwds}
pandas.Series.plot.kde: Unknown parameters {**kwds}
pandas.Series.plot.line: Unknown parameters {**kwds}
pandas.Series.plot.pie: Unknown parameters {y, **kwds}
pandas.Series.hist: Unknown parameters {`**kwds`}
pandas.Series.to_csv: Unknown parameters {mode, columns, float_format, encoding, index_label, header, quoting, chunksize, na_rep, quotechar, date_format, sep, doublequote, compression, escapechar, line_terminator, decimal, path_or_buf, index}
pandas.Series.to_hdf: Unknown parameters {mode, append, format, complevel, complib, errors, fletcher32, data_columns, dropna}
pandas.Series.to_msgpack: Unknown parameters {path, append, compress}
pandas.core.resample.Resampler.pipe: Unknown parameters {args, kwargs}
pandas.core.resample.Resampler.sem: Unknown parameters {ddof}
pandas.DataFrame.select_dtypes: Unknown parameters {include, exclude}
pandas.DataFrame.astype: Unknown parameters {kwargs}
pandas.DataFrame.pipe: Unknown parameters {args, kwargs}
pandas.DataFrame.clip: Unknown parameters {*args, **kwargs}
pandas.DataFrame.compound: Unknown parameters {**kwargs, numeric_only}
pandas.DataFrame.cummax: Unknown parameters {*args, **kwargs :}
pandas.DataFrame.cummin: Unknown parameters {*args, **kwargs :}
pandas.DataFrame.cumprod: Unknown parameters {*args, **kwargs :}
pandas.DataFrame.cumsum: Unknown parameters {*args, **kwargs :}
pandas.DataFrame.eval: Unknown parameters {kwargs}
pandas.DataFrame.mad: Unknown parameters {**kwargs, numeric_only}
pandas.DataFrame.reindex: Unknown parameters {level, limit, copy, fill_value, tolerance, method, index, columns, labels, axis}
pandas.DataFrame.rename: Unknown parameters {columns, level, mapper, copy, inplace, errors, axis, index}
pandas.DataFrame.rename_axis: Unknown parameters {mapper, copy, inplace, index, columns, axis}
pandas.DataFrame.swaplevel: Unknown parameters {i, j}
pandas.DataFrame.melt: Unknown parameters {frame}
pandas.DataFrame.T: Unknown parameters {copy, *args, **kwargs}
pandas.DataFrame.transpose: Unknown parameters {copy, *args, **kwargs}
pandas.DataFrame.update: Unknown parameters {other, filter_func, join, overwrite, errors}
pandas.DataFrame.plot: Unknown parameters {xticks, xlim, mark_right, yticks, table, legend, ylim, logx, fontsize, include_bool, colormap, title, logy, `**kwds`, kind, position, colorbar, yerr, y, x, style, rot, loglog, figsize, grid, use_index, xerr}
pandas.DataFrame.plot.area: Unknown parameters {stacked, **kwds}
pandas.DataFrame.plot.bar: Unknown parameters {**kwds}
pandas.DataFrame.plot.barh: Unknown parameters {**kwds}
pandas.DataFrame.plot.box: Unknown parameters {**kwds}
pandas.DataFrame.plot.density: Unknown parameters {**kwds}
pandas.DataFrame.plot.hexbin: Unknown parameters {**kwds}
pandas.DataFrame.plot.hist: Unknown parameters {**kwds}
pandas.DataFrame.plot.kde: Unknown parameters {**kwds}
pandas.DataFrame.plot.line: Unknown parameters {**kwds}
pandas.DataFrame.plot.pie: Unknown parameters {y, **kwds}
pandas.DataFrame.plot.scatter: Unknown parameters {**kwds}
pandas.DataFrame.sparse.from_spmatrix: Unknown parameters {index, columns}
pandas.DataFrame.to_hdf: Unknown parameters {mode, append, format, complevel, complib, errors, fletcher32, data_columns, dropna}
pandas.DataFrame.to_html: Unknown parameters {min_rows}
pandas.DataFrame.to_stata: Unknown parameters {encoding, fname, time_stamp, version, convert_dates, data_label, write_index, convert_strl, byteorder, variable_labels}
pandas.DataFrame.to_msgpack: Unknown parameters {path, append, compress}
pandas.read_excel: Unknown parameters {io, usecols, header, names, sheet_name, engine, mangle_dupe_cols, **kwds, skiprows, true_values, parse_dates, dtype, verbose, na_values, convert_float, thousands, converters, skipfooter, date_parser, keep_default_na, false_values, nrows, skip_footer, squeeze, index_col, comment}
pandas.ExcelWriter: Unknown parameters {mode, datetime_format, date_format}
pandas.read_hdf: Unknown parameters {iterator, columns, where, stop , errors, start, chunksize}
pandas.HDFStore.put: Unknown parameters {encoding, value   , append  , format  , dropna  , data_columns, key     }
pandas.HDFStore.append: Unknown parameters {chunksize   , expectedrows, nan_rep     , append      , dropna      , data_columns, min_itemsize, encoding    }
pandas.HDFStore.select: Unknown parameters {stop }
pandas.read_feather: Unknown parameters {path, nthreads, columns, use_threads}
pandas.read_stata: Unknown parameters {iterator, encoding, columns, convert_dates, filepath_or_buffer, convert_missing, preserve_dtypes, order_categoricals, index_col, convert_categoricals, chunksize}
pandas.io.stata.StataReader.data: Unknown parameters {columns, convert_dates, convert_missing, preserve_dtypes, order_categoricals, index_col, convert_categoricals}
pandas.core.window.rolling.Rolling.sum: Unknown parameters {*args, **kwargs}
pandas.core.window.rolling.Rolling.var: Unknown parameters {*args, **kwargs}
pandas.core.window.rolling.Rolling.std: Unknown parameters {*args, **kwargs}
pandas.core.window.rolling.Rolling.max: Unknown parameters {*args, **kwargs}
pandas.core.window.rolling.Rolling.apply: Unknown parameters {*args, **kwargs}
pandas.core.window.rolling.Rolling.aggregate: Unknown parameters {func}
pandas.core.window.rolling.Rolling.quantile: Unknown parameters {**kwargs:}
pandas.core.window.rolling.Window.sum: Unknown parameters {*args, **kwargs}
pandas.core.window.expanding.Expanding.sum: Unknown parameters {*args, **kwargs}
pandas.core.window.expanding.Expanding.var: Unknown parameters {*args, **kwargs}
pandas.core.window.expanding.Expanding.std: Unknown parameters {*args, **kwargs}
pandas.core.window.expanding.Expanding.max: Unknown parameters {*args, **kwargs}
pandas.core.window.expanding.Expanding.apply: Unknown parameters {*args, **kwargs}
pandas.core.window.expanding.Expanding.aggregate: Unknown parameters {func}
pandas.core.window.expanding.Expanding.quantile: Unknown parameters {**kwargs:}
pandas.core.window.ewm.EWM.mean: Unknown parameters {*args, **kwargs}
pandas.core.window.ewm.EWM.std: Unknown parameters {*args, **kwargs}
pandas.core.window.ewm.EWM.var: Unknown parameters {*args, **kwargs}
pandas.core.window.ewm.EWM.corr: Unknown parameters {bias}
pandas.io.formats.style.Styler.apply: Unknown parameters {kwargs}
pandas.io.formats.style.Styler.applymap: Unknown parameters {kwargs}
pandas.io.formats.style.Styler.where: Unknown parameters {kwargs}
pandas.io.formats.style.Styler.set_properties: Unknown parameters {kwargs}
pandas.io.formats.style.Styler.pipe: Unknown parameters {*args, **kwargs :}
pandas.io.formats.style.Styler.background_gradient: Unknown parameters {low, high}
pandas.plotting.andrews_curves: Unknown parameters {color, frame, colormap, class_column, ax, samples, kwds}
pandas.plotting.bootstrap_plot: Unknown parameters {**kwds :}
pandas.plotting.lag_plot: Unknown parameters {kwds}
pandas.plotting.parallel_coordinates: Unknown parameters {xticks, color, frame, colormap, axvlines, sort_labels, class_column, axvlines_kwds, kwds, ax, use_columns, cols}
pandas.plotting.radviz: Unknown parameters {kwds}
pandas.plotting.scatter_matrix: Unknown parameters {kwds}
pandas.describe_option: Unknown parameters {_print_desc, pat}
pandas.reset_option: Unknown parameters {pat}
pandas.get_option: Unknown parameters {pat}
pandas.set_option: Unknown parameters {pat, value}
pandas.api.types.infer_dtype: Unknown parameters {value, skipna}
pandas.api.types.is_list_like: Unknown parameters {allow_sets, obj}
pandas.api.types.is_scalar: Unknown parameters {val}
pandas.Timestamp: Unknown parameters {hour, minute, second, microsecond, year, month, day}
pandas.Timestamp.max: Unknown parameters {nanosecond, hour, minute, second, microsecond, year, month, day, ts_input, tzinfo, tz, unit, freq}
pandas.Timestamp.min: Unknown parameters {nanosecond, hour, minute, second, microsecond, year, month, day, ts_input, tzinfo, tz, unit, freq}
pandas.Timedelta.max: Unknown parameters {value, unit, **kwargs}
pandas.Timedelta.min: Unknown parameters {value, unit, **kwargs}
pandas.Period.asfreq: Unknown parameters {freq, how}
pandas.Period.to_timestamp: Unknown parameters {freq, how}
pandas.Interval: Unknown parameters {closed, right, left}
pandas.Interval.overlaps: Unknown parameters {other}
pandas.factorize: Unknown parameters {values, size_hint, sort, na_sentinel, order}
pandas.to_datetime: Unknown parameters {infer_datetime_format, format, box, arg, yearfirst, exact, cache, dayfirst, origin, unit, utc, errors}
pandas.to_timedelta: Unknown parameters {arg, errors, unit, box}
pandas.Grouper: Unknown parameters {level, sort, convention, label, base, loffset, freq, closed, axis, key}
pandas.core.groupby.GroupBy.apply: Unknown parameters {args, kwargs}
pandas.core.groupby.GroupBy.pipe: Unknown parameters {args, kwargs}
pandas.core.groupby.DataFrameGroupBy.corr: Unknown parameters {method, min_periods}
pandas.core.groupby.DataFrameGroupBy.cov: Unknown parameters {min_periods}
pandas.core.groupby.DataFrameGroupBy.describe: Unknown parameters {exclude, percentiles, include}
pandas.core.groupby.DataFrameGroupBy.diff: Unknown parameters {periods, axis}
pandas.core.groupby.DataFrameGroupBy.fillna: Unknown parameters {inplace, value, limit, downcast, method, axis}
pandas.core.groupby.DataFrameGroupBy.filter: Unknown parameters {f}
pandas.core.groupby.DataFrameGroupBy.hist: Unknown parameters {data, column, ylabelsize, sharex, by, sharey, layout, **kwds, yrot, figsize, ax, xlabelsize, bins, grid, xrot}
pandas.core.groupby.DataFrameGroupBy.idxmax: Unknown parameters {skipna, axis}
pandas.core.groupby.DataFrameGroupBy.idxmin: Unknown parameters {skipna, axis}
pandas.core.groupby.DataFrameGroupBy.mad: Unknown parameters {level, skipna, numeric_only, **kwargs, axis}
pandas.core.groupby.DataFrameGroupBy.resample: Unknown parameters {*args, **kwargs}
pandas.core.groupby.DataFrameGroupBy.skew: Unknown parameters {level, skipna, numeric_only, **kwargs, axis}
pandas.core.groupby.DataFrameGroupBy.take: Unknown parameters {indices, **kwargs, axis, is_copy}
pandas.core.groupby.DataFrameGroupBy.tshift: Unknown parameters {periods, axis, freq}
pandas.core.groupby.SeriesGroupBy.nlargest: Unknown parameters {n, keep}
pandas.core.groupby.SeriesGroupBy.nsmallest: Unknown parameters {n, keep}
pandas.core.groupby.DataFrameGroupBy.corrwith: Unknown parameters {method, other, axis, drop}
pandas.core.groupby.DataFrameGroupBy.boxplot: Unknown parameters {`**kwds`}
pandas.Index.to_native_types: Unknown parameters {kwargs}
pandas.CategoricalIndex.rename_categories: Unknown parameters {inplace, new_categories}
pandas.CategoricalIndex.reorder_categories: Unknown parameters {inplace, ordered, new_categories}
pandas.CategoricalIndex.add_categories: Unknown parameters {inplace, new_categories}
pandas.CategoricalIndex.remove_categories: Unknown parameters {inplace, removals}
pandas.CategoricalIndex.remove_unused_categories: Unknown parameters {inplace}
pandas.CategoricalIndex.set_categories: Unknown parameters {rename, inplace, ordered, new_categories}
pandas.CategoricalIndex.as_ordered: Unknown parameters {inplace}
pandas.CategoricalIndex.as_unordered: Unknown parameters {inplace}
pandas.MultiIndex: Unknown parameters {copy, names, verify_integrity, sortorder, levels, labels, codes}
pandas.MultiIndex.set_codes: Unknown parameters {inplace, verify_integrity, codes, level}
pandas.DatetimeIndex: Unknown parameters {periods , data , copy }
pandas.DatetimeIndex.indexer_between_time: Unknown parameters {start_time, end_time}
pandas.DatetimeIndex.strftime: Unknown parameters {date_format}
pandas.DatetimeIndex.tz_convert: Unknown parameters {tz}
pandas.DatetimeIndex.tz_localize: Unknown parameters {tz, errors, ambiguous, nonexistent}
pandas.DatetimeIndex.round: Unknown parameters {nonexistent, ambiguous, freq}
pandas.DatetimeIndex.floor: Unknown parameters {nonexistent, ambiguous, freq}
pandas.DatetimeIndex.ceil: Unknown parameters {nonexistent, ambiguous, freq}
pandas.DatetimeIndex.month_name: Unknown parameters {locale}
pandas.DatetimeIndex.day_name: Unknown parameters {locale}
pandas.DatetimeIndex.to_period: Unknown parameters {freq}
pandas.DatetimeIndex.to_perioddelta: Unknown parameters {freq}
pandas.DatetimeIndex.mean: Unknown parameters {skipna}
pandas.TimedeltaIndex: Unknown parameters {periods , data , copy }
pandas.TimedeltaIndex.round: Unknown parameters {nonexistent, ambiguous, freq}
pandas.TimedeltaIndex.floor: Unknown parameters {nonexistent, ambiguous, freq}
pandas.TimedeltaIndex.ceil: Unknown parameters {nonexistent, ambiguous, freq}
pandas.TimedeltaIndex.mean: Unknown parameters {skipna}
pandas.PeriodIndex: Unknown parameters {quarter, month, day, hour, second, year, minute}
pandas.PeriodIndex.asfreq: Unknown parameters {freq, how}
pandas.PeriodIndex.strftime: Unknown parameters {date_format}
pandas.PeriodIndex.to_timestamp: Unknown parameters {freq, how}
pandas.api.extensions.ExtensionArray.argsort: Unknown parameters {*args, **kwargs:}
pandas.tseries.offsets.CBMonthEnd.apply_index: Unknown parameters {i}
pandas.tseries.offsets.CBMonthBegin.apply_index: Unknown parameters {i}
AminooZ commented 5 years ago

I made a pull request #28340 to solve "pandas.Series.astype: Unknown parameters {kwargs}". I'll continue working on more PR02 issues in docstrings.

WuraolaOyewusi commented 4 years ago

I made a PR to fix pandas.Series.clip: Unknown parameters {*args, **kwargs}.

ChiefMilesEdgeworth commented 4 years ago

I've been working on this today, and it seems some of this is being caused by decorators. For instance, in strings.py, when the @forbid_nonstring_types decorator is commented out on one of the offending methods, it passes without the error. I've also seen this behavior on the @deprecate_kwarg decorator. I'm not sure what the fix should be yet, I'd have to look into it a bit more.

WuraolaOyewusi commented 4 years ago

Hi @Nabel0721 . I'm interested in knowing what you find. Thanks

ChiefMilesEdgeworth commented 4 years ago

@WuraolaOyewusi So, it looks like the function signature_parameters in the validation script needs some work. After inspecting one of the offending functions, I found this: Pandas decorator issue It looks like the getfullargspec function is missing parts of the signature. I think this is because the signature function says that it unwraps wrapping chains (layers of decorators), while getfullargspec does not.

If this is the issue, it would solve some of the errors we have, but likely introduce more as we catch more arguments in other function.

WuraolaOyewusi commented 4 years ago

@Nabel0721 . I will be following to learn how this is sorted. Please tag me as you make changes. Have a great day

ChiefMilesEdgeworth commented 4 years ago

@WuraolaOyewusi I put together a PR (#28765) that switches the script to use signature. According to the documentation for it, signature finds the source function by checking if a function has the __wrapped__ attribute. If it does, it contains the function that was wrapped by the current one. This is automatically set by the @wraps decorator and other similar functions from the functools library. So, if we're having issues with a decorator overriding the signature of a function, we need to check if it has the @wraps decorator inside, or if it's a class-made decorator if it has something like functools.update_wrapper(self, func). This is only necessary if we need the signature of the underlying function. For CPython, decorators should probably have the __wrapped__ attribute assigned directly.

WuraolaOyewusi commented 4 years ago

Ok Abel. Thanks for letting me know. I will look for the PR and continue to see how the whole process works.

On Fri, 4 Oct 2019, 6:11 PM Nathan Abel, notifications@github.com wrote:

@WuraolaOyewusi https://github.com/WuraolaOyewusi I put together a PR that switches the script to use signature. According to the documentation for it, signature finds the source function by checking if a function has the wrapped attribute. If it does, it contains the function that was wrapped by the current one. This is automatically set by the @wraps decorator and other similar functions from the functools library. So, if we're having issues with a decorator overriding the signature of a function, we need to check if it has the @wraps decorator inside, or if it's a class-made decorator if it has something like functools.update_wrapper(self, func). This is only necessary if we need the signature of the underlying function. For CPython, decorators should probably have the wrapped attribute assigned directly.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/27976?email_source=notifications&email_token=AKDK7XA47MA5TOHPPY77RQ3QM52L7A5CNFSM4IMQON4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAMJUCA#issuecomment-538483208, or mute the thread https://github.com/notifications/unsubscribe-auth/AKDK7XAUPDI3QQ6RJEFFLWTQM52L7ANCNFSM4IMQON4A .

aflah02 commented 3 years ago

@datapythonista Hey I checked out this issue and would like to help. So right now if I look at the list and say see the last entry which is - pandas.tseries.offsets.CBMonthBegin.apply_index: Unknown parameters {i} What exactly do I have to do I went to the website and found this https://pandas.pydata.org/pandas-docs/version/0.25.3/reference/api/pandas.tseries.offsets.CBMonthBegin.apply_index.html Here the parameters seem to be fine

Also for some random example like pandas.core.window.rolling.Rolling.apply: Unknown parameters {*args, *kwargs} I can't seem to understand what it's trying to say. Does it mean it had unknown Parameters earlier and the right parameters are {args, **kwargs}?

Also as far as I could see most of them have parameters already present when I open the website to look at them. Is there an updated list as well?

datapythonista commented 3 years ago

The link that you're sharing is for an old version of pandas. You should check the latest version at: https://pandas.pydata.org/docs/dev/

There are two parameters for any function:

def my_function(actual_parameter):
    """
    Does something.

    Parameters
    ----------
    unknown_parameter : str
        This is unknown, and actual_parameter is missing, there are two incorrect things here.
    """
    pass

So, an unknown parameter as you can see in the example is a documented parameter that we don't know about, because it's not an actual parameter in the signature of the function.

aflah02 commented 3 years ago

ohkk understood thanks @datapythonista

cfernhout commented 3 years ago

@datapythonista Hi, I've checked out pandas.DataFrame.sparse.from_spmatrix: Unknown parameters {index, columns}. I had a look at the code I found this:

@classmethod
def from_spmatrix(cls, data, index=None, columns=None):
    """
    Create a new DataFrame from a scipy sparse matrix.
    .. versionadded:: 0.25.0
    Parameters
    ----------
    data : scipy.sparse.spmatrix
        Must be convertible to csc format.
    index, columns : Index, optional
        Row and column labels to use for the resulting DataFrame.
        Defaults to a RangeIndex.
    Returns
    -------
    DataFrame
        ...
        etc
    """

Should I assume this is wrong and change it to a separate row for index and columns? Or is this correct and ./scripts/validate_docstrings.py --errors=PR02 should have passed this docstring?

datapythonista commented 3 years ago

Ideally we would like to allow document two columns at a time. So, updating the script so index, columns is accepted and doesn't raise an error would be the preferred option. I'm not sure how often we are using that format of two columns at once. If it's very rarely, unpacking them and documenting them separately would also be fine.

jle3 commented 3 years ago

pandas.DataFrame.plot.pie requires one of either a column reference y, or subplots=True, calling it without returns a ValueError. However, neither appears in the parameters of its function signature def pie(self, **kwargs):.

This requirement is mentioned in the extended summary but from the pandas docstring guide,

The extended summary provides details on what the function does. It should not go into the details of the parameters, or discuss implementation notes, which go in other sections.

How should the docstring be modified in this case or is this an issue with the validation script?

datapythonista commented 3 years ago

Since the signature contains only **kwargs, so this is the only parameter that can be documented. In the description of the parameter you can explain about those accepted kwargs.

I think in this case it could make sense to change the signature of the function to something like pie(self, y=None, subplots=True, **kwargs). Maybe there is a reason why it's not like this, but I think that should work and be more explicit. If you want to open a PR for that, it would be great (or if you prefer to open an issue, that would be good too).

joshuabello2550 commented 2 years ago

take

joshuabello2550 commented 2 years ago

When a user runs the command python ./scripts/validate_docstrings.py --errors=PR02 in a terminal they are presented with a list of errors with the docstrings for functions, such as what is shown in the errors.txt file (I created). In the errors' list, many errors point to the same line of code, such as /home/pandas/pandas/core/indexes/extension.py:92. However, when you look at the line in the file, line 92 def method(self, *args, **kwargs) it seems to just be defining a function within another wrapper function. Are these errors supposed to be at the same location or should the validate doc-strings only check the wrapper function's doc-string and not the inner function?

Screenshot 2022-08-01 215851 Screenshot 2022-08-01 220408 Screenshot 2022-08-01 220440

datapythonista commented 2 years ago

We did our best to point to the docstring, but many docstrings are build in a dynamic way, and it's not possible to know where the actual docstring content is defined. This seems to be the case here, so just ignore that information. You can better use grep with some sentence of the docstring you're looking for to see where it's defined.

tqa236 commented 4 months ago

hi @datapythonista, should this issue be closed in favor of https://github.com/pandas-dev/pandas/issues/58063, which is more up-to-date and clearer instruction about how to fix docstring errors?