terrencepreilly / darglint

A python documentation linter which checks that the docstring description matches the definition.
MIT License
483 stars 41 forks source link

`lambda` functions treated as nested functions #142

Closed loganthomas closed 3 years ago

loganthomas commented 3 years ago

Hi @terrencepreilly,

I wanted to thank you for making darglint. It's been a huge help for my projects :)

I noticed that in a recent release (1.5.6 I believe), a new functionality was added to handle nested functions. I have seen that this potentially causes issues when using lambda functions.

Example:

def example_func(
    df: pd.DataFrame,
    groupby_column_name: Hashable,
    sort_column_name: Hashable,
    k: int,
    sort_values_kwargs: Dict = None,
) -> pd.DataFrame:
    """
    :param df: A pandas dataframe.
    :param groupby_column_name: Column name to group input dataframe `df` by.
    :param sort_column_name: Name of the column to sort along the
        input dataframe `df`.
    :param k: Number of top rows to return from each group after sorting.
    :param sort_values_kwargs: Arguments to be passed to sort_values function.
    :returns: A pandas dataframe with top `k` rows that are grouped by
        `groupby_column_name` column with each group sorted along the
        column `sort_column_name`.
    :raises ValueError: if `k` is less than 1.
    :raises ValueError: if `groupby_column_name` not in dataframe `df`.
    :raises ValueError: if `sort_column_name` not in dataframe `df`.
    :raises KeyError: if `inplace:True` is present in `sort_values_kwargs`.
    """  # noqa: E501

    # Convert the default sort_values_kwargs from None to empty Dict
    sort_values_kwargs = sort_values_kwargs or {}

    # Check if groupby_column_name and sort_column_name exists in the dataframe
    check_column(df, [groupby_column_name, sort_column_name])

    # Check if k is greater than 0.
    if k < 1:
        raise ValueError(
            "Numbers of rows per group to be returned must be greater than 0."
        )

    # Check if inplace:True in sort values kwargs because it returns None
    if (
        "inplace" in sort_values_kwargs.keys()
        and sort_values_kwargs["inplace"]
    ):
        raise KeyError("Cannot use `inplace=True` in `sort_values_kwargs`.")

    return df.groupby(groupby_column_name).apply(
        lambda d: d.sort_values(sort_column_name, **sort_values_kwargs).head(k)
    )

This will cause darglint to output the following error: DAR101: Missing parameter(s) in Docstring: - d

Possible Solution

I can solve this by adding an ignore statement (# noqa: DAR101 d) to the docstring. However, I wanted to confirm that this was an expected behavior. I would need to add this ignore statement to any function that uses a lambda call.

Thank you for any input you can give on this topic!

terrencepreilly commented 3 years ago

Thank you for the timely and thorough feedback. This is not expected behavior. I just didn't realize that the Lambda node in python's ast module re-uses the Arguments node from FunctionDef and AsyncFunctionDef. I've given lambdas their own function context, so that their arguments don't leak. (Although, note that this now means that exceptions raised in lambdas won't be detected in the function -- though that seems like an edge case.) I should have resolved the issue in 525577e71f11214d4be7dea187b54423f945cb6a, and will push it to v1.5.7 as soon as my regression tests pass. Thank you!

loganthomas commented 3 years ago

@terrencepreilly Thank you so much for your quick fix! Working like a charm on my lambda functions now. Really appreciate your help! 👍 🎉