AttributeError: 'PandasExprVisitor' object has no attribute 'visit_Ellipsis'

Moondra commented 7 years ago

When trying to run this code, I'm getting the above error. I worked with Max U from StackOverflow in a private chat and he concluded it was a bug.

Here is the stackoverflow link: http://stackoverflow.com/questions/43838557/custom-boolean-filtering-in-pandas

Here is my gitub which contains the data if you want to reproduce the error:

https://github.com/Moondra/Logistic-Regression-

The data can be found as a pickle under the label 'Small_cap_bio_DF' Just be sure to use the line df['Market Cap'][df['Market Cap'] =='N/A'] = '-1' to remove the N/A values.

The line producing the error is

df[pd.eval(df['Market Cap'].replace(['[Kk]','[Mm]','[Bb]'],['*10**3','*10**6','*10**9'], regex=True).add(' < 35*10**6'))]

df['Market Cap'] looks something like this

A      AAAP       1.66B
       ABEO     223.42M
       ABIO       22.5M
       ABUS     181.58M
       ACAD       3.99B
       ACHN     507.25M
       ACIU      512.3M
       ACOR     750.47M
       ACRS     696.94M
       ACST      20.41M
       ADAP     449.31M
       ADHD       32.8M
       ADMA      58.51M
       ADMS      388.4M
       ADRO     709.58M
       ADVM     122.17M
       ADXS     321.96M
       AERI       1.35B
       AEZS      13.27M
       AFMD      96.66M
       AGEN     371.69M

Problem description

Instead of filtering the dataframe in relation to its Marketcap, I'm getting an AttributeError.

Expected Output

Filtering all rows whose Marketcap value is less than 30M.

Output of `pd.show_versions()`

# Paste the output here pd.show_versions() here

TomAugspurger commented 7 years ago

Can you make a reproducible example? df isn't defined.

maxu777 commented 7 years ago

Here is a way to reproduce this issue:

download pickled DF

fn = r'D:\download\Small_cap_bio_DF'
df = pd.read_pickle(fn)
df.loc[df['Market Cap'] =='N/A', 'Market Cap'] = '-1'

the following works when a DF is splitted into four parts:

[x[pd.eval(x['Market Cap'].replace(['[Kk]','[Mm]','[Bb]'],['*10**3','*10**6','*10**9'], regex=True).add(' < 35*10**6'))]
 for x in np.split(df,4)]

if we split it into two parts:

[x[pd.eval(x['Market Cap'].replace(['[Kk]','[Mm]','[Bb]'],['*10**3','*10**6','*10**9'], regex=True).add(' < 35*10**6'))]
 for x in np.split(df,2)]

it produces:

AttributeError: 'PandasExprVisitor' object has no attribute 'visit_Ellipsis'

chris-b1 commented 7 years ago

Simple repo:

In [93]: s = pd.Series(['1 == 1', '2 == 1'] * 1000)

In [94]: pd.eval(s.head())
Out[94]: array([True, False, True, False, True], dtype=object)

In [95]: pd.eval(s)
AttributeError: 'PandasExprVisitor' object has no attribute 'visit_Ellipsis'

This is a sort of a mis-use of eval, I was surprised it worked at all - apparently the Series repr is being picked up by eval, and with a long Series, the trunctation characters (...) is parsed as Ellipsis.

maxu777 commented 7 years ago

@chris-b1,

it's a good reproducible example, thank you!

the limit seems to be 100 rows:

this works:

pd.eval(s.head(100))

the following produces mentioned above error:

pd.eval(s.head(101))

TomAugspurger commented 7 years ago

We don't want to support passing pandas objects to eval right? It takes a string, not a Series.

pandas-dev / pandas