machow / siuba

Python library for using dplyr like syntax with pandas and SQL
https://siuba.org
MIT License
1.14k stars 48 forks source link

verbs without positional arguments raise pylint(no-value-for-parameter) #465

Open nathanjmcdougall opened 1 year ago

nathanjmcdougall commented 1 year ago

When I use mutate, pylint gives an error:

No value for argument '__data' in function call pylint(no-value-for-parameter)

This is true for other verbs too. I have found that any time a verb is used without a positional argument it will give this error:

import pandas as pd
from siuba import _, mutate, rename, select

df = pd.DataFrame.from_dict({"x": [1, 2, 3]})

df >>= mutate(y = _.x + 1) # linter issue on this line
df2 = df >> mutate(z = _.y + 1) # also on this line
df3 = df >> rename(z=_.x) # also on this line

df4 = mutate(df, z = _.y + 1) # no linter issue on this line
df5 = df >> select(_.x, _.y) # neither on this line

I suspect some wrapping needs to be done in singledispatch2 to prevent the argspec from looking like it requires __data as a positional argument, but that's just a hunch.

I'm using pandas==1.5.2, siuba==0.4.2, and pylint==2.15.8 with default configuration. For what it's worth I also get Pylance issues.

machow commented 1 year ago

Thanks for reporting. Let me try adding some overloads quickly to the decorator being used, to see it can silence the warnings. siuba uses generic function dispatch, which python has poor type support for, but we can at least silence warnings with very general overloads.

Related to:

machow commented 1 year ago

I added a quick draft in this PR, but am still seeing the error, so need to dig a bit deeper..! If you have any sense for why it'd still throw pylance errors with these annotations, would love your feedback!

https://github.com/machow/siuba/pull/466

machow commented 1 year ago

@nathanjmcdougall it seems to be working now on this PR: https://github.com/machow/siuba/pull/466 , any chance you can double check?

Should be able to install the PR using...

pip install git+https://github.com/machow/siuba.git@ux-type-overloads
nathanjmcdougall commented 1 year ago

I've installed that PR branch but unfortunately pylint is still giving errors in the same way. Pylance isn't, although curiously I wasn't able to reproduce the Pylance errors even on the deployed (non-PR) version 0.4.2. It may be that things are fixed from Pylance's perspective. I don't understand Pylance well enough to check that myself.

As for pylint, looking into it, it seems that pylint has some long-standing limitations when decorators are used: https://github.com/PyCQA/pylint/issues/259

I see two ways forward: make the code a little uglier for the sake of pylint users, or expect pylint users to use a standard bit of configuration when using siuba.

For the first option: A simple workaround I think is just to go through all the verbs with __data as an argument and set a default value of None. Then add a simple check in verb_dispatch which will raise an error if __data is None, i.e. if the verb isn't being used properly and is relying on the default. I've checked and this seems to fix the issue.

There may be other options. One is to not use decorators and just use f = verb_dispatch(DataFrame)(_f), but that's very ugly indeed.

Or, just suggest that siuba users add this pylint configuration which solves the issue:

[tool.pylint.typecheck]
signature-mutators = ["siuba.siu.dispatchers.verb_dispatch"]

If so, it may be good to add some guidance in the documentation for pylint users by compiling some standard configuration options, e.g. also the one here: https://github.com/machow/siuba/issues/464#issuecomment-1357540957

machow commented 1 year ago

Thanks for digging into this! This is a tough one, since it seems like pylint is doing a poor job with this specific type check, and doesn't make full use the overloads.

Adding a new argument is nice as a quick fix, but will make type annotations inside concrete versions less useful, since you'd need to annotate data: None | pd.DataFrame. Part of this is just because verbs are currently defined using the pandas implementations (the initial function wrapped by @verb_dispatch).

I've been wanting to move the initial verb definitions to a module called generics, to decouple from pandas a bit more. I can try that and add a default argument when the generic is defined....

machow commented 1 year ago

@wch pointed out pylint seems to allow plugins. Not sure how they work, but could be another option..