Open adam-azarchs opened 3 years ago
Actually looking more closely at the pandas code, I think the root cause may be that it's inferring the return type as TextFileReader
where actually in this case it should be a DataFrame
or Series
, since we're not setting iterator
or chunksize
.
Thank you for creating the issue, I can reproduce this. Regarding the error that goes away with the print statement, could it be that there is two no-member
error, one on 'TextFileReader' has no 'shape' member (no-member)
on the print the other Instance of 'TextFileReader' has no 'columns' member (no-member)
on the for loop ?
I never see an error for shape
. Possibly TextFileReader
does have a shape
? I haven't looked too deeply into the pandas source code. But that wouldn't explain why columns
only complains after that print
.
I was thinking maybe you did not notice the warning on the print line. I have 4 warnings with your example one of them on the print line for "shape". Do you mean you have 3 warnings with the example you gave and the "columns" warning disappear if you remove the print ?
Yes.
OK, I cannot reproduce that with pandas 1.2.4 but the main problem is the false positive for no-member
anyway.
In our codebase the main problem is actually the false positive on unsubscriptable-object
. The no-member
error can be easily worked around by setting generated-members
. However as I said I believe the root cause is the same for both - the inferred type TextFileReader
is not correct.
We see the same thing. Strangely enough the following snippet
#pylint: disable=missing-module-docstring,pointless-statement
import pandas as pd
df = pd.read_csv("some.csv")
df.columns
df.columns
gives Instance of 'TextFileReader' has no 'columns' member
.
However if either
pandas
during the df
initialization are commented out (which are not executed anyway due to squeeze=False
by default)it goes away. :thinking:
Investigated a bit further. The false positive appears to have been introduced between astroid==2.5.7
and astroid==2.5.8
, in https://github.com/PyCQA/astroid/pull/1009. For the specific snippet above it looks like increasing max_inferred
to >=166
removes the false positive in this case.
I am seeing a similar issue with the code:
import io
import pandas
df = pandas.read_csv(io.StringIO("well\nx"))
df.loc[:, "well"] = df.well.str.replace("x", "y")
Instance of 'TextFileReader' has no 'well' member
I believe it is the same root cause, as the errors disappears after one of numerous manipulations. Maybe it is helpful as a test case once this bug is fixed.
I too have this issue. Interestingly commenting out the df.dropna(...)
line removes the problem.
Edit: I have checked and this is also due to thinking it is a TextFileReader
# test.py
import pandas as pd
df = pd.read_csv("input_filename")
df.dropna(subset=["title"], inplace=True)
have_NaN = df[["severity", "priority", "notice"]].isna().any(axis=1)
% # with dropna
% pylint test.py --disable=missing-module-docstring
************* Module test
test.py:5:11: E1136: Value 'df' is unsubscriptable (unsubscriptable-object)
--------------------------------------------------------------------
Your code has been rated at -2.50/10 (previous run: -2.50/10, +0.00)
% # after remove dropna
% pylint test.py --disable=missing-module-docstring
---------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: -2.50/10, +12.50)
% pylint --version
pylint 2.11.1
astroid 2.8.0
Python 3.7.5 (default, Aug 13 2020, 09:55:33)
[Clang 11.0.3 (clang-1103.0.32.62)]
% python -c 'import pandas; print(pandas.__version__)'
1.3.3
Huh, I wasn't imaging :joy: this really was a change in behavior.
When you remove inplace=True
and instead assign the changed data frame from dropna
to df
again everything is fine.
I am using
pylint 2.9.6
astroid 2.6.6
pandas 1.3.2
EDIT: I just noticed that the error message in https://github.com/PyCQA/pylint/issues/4577#issuecomment-930846829 above is also referencing the variable name, so some of what's below is redundant.
"inplace=True" seems problematic in multiple cases. The following produces a similar issue, but interestingly enough it does NOT reference "TextFileReader" as the unsupported type but rather the variable name itself. Possibly a related but independent issue?
import pandas as pd
data_frame: pd.DataFrame = pd.read_csv("foo.csv")
data_frame.fillna("", inplace=True)
data_frame["bar"] = data_frame[["baz", "bat"]].apply(
lambda row: f'{str(row["baz"])}-{str(row["bat"])}',
axis=1
)
The linting the above results in the following (on my system at least ;)):
5,0,error,unsupported-assignment-operation:'data_frame' does not support item assignment
5,27,error,unsubscriptable-object:Value 'data_frame' is unsubscriptable
Replacing the "inplace" operation with data_frame = data_frame.fill_na("") eliminates the error.
Version info:
pandas 1.3.4
astroid 2.9.0
pylint 2.12.2
Based on the findings in https://github.com/PyCQA/pylint/issues/4577#issuecomment-871694490 we use the workaround below which might be useful for others here facing this issue.
Basically we for now increase astroid.context.InferenceContext.max_inferred
to a higher value than the hard coded 100 using e.g.
[MASTER]
# As a temporary workaround for https://github.com/PyCQA/pylint/issues/4577
init-hook = "import astroid; astroid.context.InferenceContext.max_inferred = 500"
in .pylintrc
, or alternatively as a direct command line argument
pylint some_file_to_lint.py --init-hook "import astroid; astroid.context.InferenceContext.max_inferred = 500"
Thank you for the example code @anders-kiaer
With max_inferred
= 100
The return types are:
[<Instance of pandas.io.parsers.readers.TextFileReader>, Uninferable]
Only one positive return type
With max_inferred
= 500
The return types are:
[<Instance of pandas.io.parsers.readers.TextFileReader>, Uninferable, <Instance of pandas.core.frame.DataFrame>]
Two positive return types
What do you think @Pierre-Sassoulas @PCManticore should unsubscriptable-object
be raised if a singular type is returned with an Uninferable
return type? I can understand that pandas is sacrificing consistent return types for usability with this function however it looks like unsubscriptable-object
doesn't raise if there are multiple positive return types like with max_inferred=500
I think a brain could be created for read_csv
if we don't want to change unsubscriptable-object
.
Raising max_inferred
really slows down this code -from 1s to 10s on my computer so I don't think it may be a good idea to change it @anders-kiaer
code for printing out types:
MAX_INFERRED = 500
import astroid
astroid.context.InferenceContext.max_inferred = MAX_INFERRED
ret = astroid.extract_node("""
import pandas as pd
df = pd.read_csv("some.csv")
df #@
df.columns
df.columns
""")
inferred = ret.inferred()
print(inferred)
The same problem appears with fillna:
import pandas
dataframe = pandas.DataFrame({'A': [1,2,None]})
dataframe = dataframe.fillna(0)
print(dataframe['A'])
pylint will complain about E1136: Value 'dataframe' is unsubscriptable (unsubscriptable-object)
Steps to reproduce
Current behavior
Strangely, the
no-member
error goes away if you leave out theprint
statement.Expected behavior
No errors, which was the case with pylint 2.7.x.
pylint --version output
Result of
pylint --version
output:Additional dependencies: