False positives on ``pandas.io.parsers.TextFileReader``

adam-azarchs commented 3 years ago

Steps to reproduce

# pylint: disable=missing-module-docstring
# pylint: enable=unsubscriptable-object,unsupported-assignment-operation,no-member
import pandas as pd

data_frame = pd.read_csv("foo.csv")
print(data_frame.shape)
for column in data_frame.columns:
    data_frame[column] = data_frame[column].astype("S")

Current behavior

repro.py:7:14: E1101: Instance of 'TextFileReader' has no 'columns' member (no-member)
repro.py:8:4: E1137: 'data_frame' does not support item assignment (unsupported-assignment-operation)
repro.py:8:25: E1136: Value 'data_frame' is unsubscriptable (unsubscriptable-object)

Strangely, the no-member error goes away if you leave out the print statement.

Expected behavior

No errors, which was the case with pylint 2.7.x.

pylint --version output

Result of pylint --version output:

pylint 2.8.3
astroid 2.5.8
Python 3.7.10 (default, Jun  4 2021, 14:48:32)

Additional dependencies:

pandas==1.2.4

adam-azarchs commented 3 years ago

Actually looking more closely at the pandas code, I think the root cause may be that it's inferring the return type as TextFileReader where actually in this case it should be a DataFrame or Series, since we're not setting iterator or chunksize.

Pierre-Sassoulas commented 3 years ago

Thank you for creating the issue, I can reproduce this. Regarding the error that goes away with the print statement, could it be that there is two no-member error, one on 'TextFileReader' has no 'shape' member (no-member) on the print the other Instance of 'TextFileReader' has no 'columns' member (no-member) on the for loop ?

adam-azarchs commented 3 years ago

I never see an error for shape. Possibly TextFileReader does have a shape? I haven't looked too deeply into the pandas source code. But that wouldn't explain why columns only complains after that print.

Pierre-Sassoulas commented 3 years ago

I was thinking maybe you did not notice the warning on the print line. I have 4 warnings with your example one of them on the print line for "shape". Do you mean you have 3 warnings with the example you gave and the "columns" warning disappear if you remove the print ?

adam-azarchs commented 3 years ago

Yes.

Pierre-Sassoulas commented 3 years ago

OK, I cannot reproduce that with pandas 1.2.4 but the main problem is the false positive for no-member anyway.

adam-azarchs commented 3 years ago

In our codebase the main problem is actually the false positive on unsubscriptable-object. The no-member error can be easily worked around by setting generated-members. However as I said I believe the root cause is the same for both - the inferred type TextFileReader is not correct.

anders-kiaer commented 3 years ago

We see the same thing. Strangely enough the following snippet

#pylint: disable=missing-module-docstring,pointless-statement

import pandas as pd

df = pd.read_csv("some.csv")

df.columns
df.columns

gives Instance of 'TextFileReader' has no 'columns' member.

However if either

the last line is commented out
or alternatively these two lines in pandas during the df initialization are commented out (which are not executed anyway due to squeeze=False by default)

it goes away. :thinking:

anders-kiaer commented 3 years ago

Investigated a bit further. The false positive appears to have been introduced between astroid==2.5.7and astroid==2.5.8, in https://github.com/PyCQA/astroid/pull/1009. For the specific snippet above it looks like increasing max_inferred to >=166 removes the false positive in this case.

bersbersbers commented 3 years ago

I am seeing a similar issue with the code:

import io
import pandas

df = pandas.read_csv(io.StringIO("well\nx"))
df.loc[:, "well"] = df.well.str.replace("x", "y")

Instance of 'TextFileReader' has no 'well' member

I believe it is the same root cause, as the errors disappears after one of numerous manipulations. Maybe it is helpful as a test case once this bug is fixed.

spagh-eddie commented 3 years ago

I too have this issue. Interestingly commenting out the df.dropna(...) line removes the problem.

Edit: I have checked and this is also due to thinking it is a TextFileReader

# test.py 
import pandas as pd

df = pd.read_csv("input_filename")
df.dropna(subset=["title"], inplace=True)
have_NaN = df[["severity", "priority", "notice"]].isna().any(axis=1)

% # with dropna
% pylint test.py --disable=missing-module-docstring
************* Module test
test.py:5:11: E1136: Value 'df' is unsubscriptable (unsubscriptable-object)

--------------------------------------------------------------------
Your code has been rated at -2.50/10 (previous run: -2.50/10, +0.00)

% # after remove dropna
% pylint test.py --disable=missing-module-docstring

---------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: -2.50/10, +12.50)

% pylint --version
pylint 2.11.1
astroid 2.8.0
Python 3.7.5 (default, Aug 13 2020, 09:55:33) 
[Clang 11.0.3 (clang-1103.0.32.62)]
% python -c 'import pandas; print(pandas.__version__)'
1.3.3

ozyo commented 3 years ago

Huh, I wasn't imaging :joy: this really was a change in behavior.

When you remove inplace=True and instead assign the changed data frame from dropna to df again everything is fine.

I am using

pylint 2.9.6
astroid 2.6.6
pandas 1.3.2

shartzog commented 2 years ago

EDIT: I just noticed that the error message in https://github.com/PyCQA/pylint/issues/4577#issuecomment-930846829 above is also referencing the variable name, so some of what's below is redundant.

"inplace=True" seems problematic in multiple cases. The following produces a similar issue, but interestingly enough it does NOT reference "TextFileReader" as the unsupported type but rather the variable name itself. Possibly a related but independent issue?

import pandas as pd

data_frame: pd.DataFrame = pd.read_csv("foo.csv")
data_frame.fillna("", inplace=True)
data_frame["bar"] = data_frame[["baz", "bat"]].apply(
    lambda row: f'{str(row["baz"])}-{str(row["bat"])}',
    axis=1
)

The linting the above results in the following (on my system at least ;)):

5,0,error,unsupported-assignment-operation:'data_frame' does not support item assignment
5,27,error,unsubscriptable-object:Value 'data_frame' is unsubscriptable

Replacing the "inplace" operation with data_frame = data_frame.fill_na("") eliminates the error.

Version info:

pandas                    1.3.4
astroid                   2.9.0
pylint                    2.12.2

anders-kiaer commented 2 years ago

Based on the findings in https://github.com/PyCQA/pylint/issues/4577#issuecomment-871694490 we use the workaround below which might be useful for others here facing this issue.

Basically we for now increase astroid.context.InferenceContext.max_inferred to a higher value than the hard coded 100 using e.g.

[MASTER]

# As a temporary workaround for https://github.com/PyCQA/pylint/issues/4577
init-hook = "import astroid; astroid.context.InferenceContext.max_inferred = 500"

in .pylintrc, or alternatively as a direct command line argument

pylint some_file_to_lint.py --init-hook "import astroid; astroid.context.InferenceContext.max_inferred = 500"

brycepg commented 2 years ago

Thank you for the example code @anders-kiaer

With max_inferred = 100
The return types are:
[<Instance of pandas.io.parsers.readers.TextFileReader>, Uninferable]

Only one positive return type

With max_inferred = 500 The return types are:
[<Instance of pandas.io.parsers.readers.TextFileReader>, Uninferable, <Instance of pandas.core.frame.DataFrame>]

Two positive return types

What do you think @Pierre-Sassoulas @PCManticore should unsubscriptable-object be raised if a singular type is returned with an Uninferable return type? I can understand that pandas is sacrificing consistent return types for usability with this function however it looks like unsubscriptable-object doesn't raise if there are multiple positive return types like with max_inferred=500

I think a brain could be created for read_csv if we don't want to change unsubscriptable-object.

Raising max_inferred really slows down this code -from 1s to 10s on my computer so I don't think it may be a good idea to change it @anders-kiaer

code for printing out types:



MAX_INFERRED = 500

import astroid
astroid.context.InferenceContext.max_inferred = MAX_INFERRED

ret = astroid.extract_node("""
import pandas as pd

df = pd.read_csv("some.csv")

df #@
df.columns
df.columns
""")
inferred = ret.inferred()
print(inferred)

FredStober commented 2 years ago

The same problem appears with fillna:

import pandas
dataframe = pandas.DataFrame({'A': [1,2,None]})
dataframe = dataframe.fillna(0)
print(dataframe['A'])

pylint will complain about E1136: Value 'dataframe' is unsubscriptable (unsubscriptable-object)

pylint-dev / pylint