pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.41k stars 17.83k forks source link

Not possible to assign string values to columns using DataFrame.eval() under Python 3.6 #15320

Open tobgu opened 7 years ago

tobgu commented 7 years ago

Code Sample, a copy-pastable example if possible

>>> import pandas as pd
>>> df = pd.DataFrame(data=[(1,2),(3,4)], columns=['a', 'b'])
>>> df
   a  b
0  1  2
1  3  4
>>> df.eval("c='foo'")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/lib/python3.6/site-packages/pandas-0.19.2-py3.6-linux-x86_64.egg/pandas/core/frame.py", line 2279, in eval
    return _eval(expr, inplace=inplace, **kwargs)
  File "/lib/python3.6/site-packages/pandas-0.19.2-py3.6-linux-x86_64.egg/pandas/computation/eval.py", line 266, in eval
    ret = eng_inst.evaluate()
  File "/lib/python3.6/site-packages/pandas-0.19.2-py3.6-linux-x86_64.egg/pandas/computation/engines.py", line 76, in evaluate
    res = self._evaluate()
  File "/lib/python3.6/site-packages/pandas-0.19.2-py3.6-linux-x86_64.egg/pandas/computation/engines.py", line 123, in _evaluate
    return ne.evaluate(s, local_dict=scope, truediv=truediv)
  File "/lib/python3.6/site-packages/numexpr/necompiler.py", line 789, in evaluate
    zip(names, arguments)]
  File "/lib/python3.6/site-packages/numexpr/necompiler.py", line 788, in <listcomp>
    signature = [(name, getType(arg)) for (name, arg) in
  File "/lib/python3.6/site-packages/numexpr/necompiler.py", line 686, in getType
    raise ValueError("unknown type %s" % a.dtype.name)
ValueError: unknown type str96

# Switching type to bytes produces another error
>>> df.eval(b"c='foo'")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/lib/python3.6/site-packages/pandas-0.19.2-py3.6-linux-x86_64.egg/pandas/core/frame.py", line 2279, in eval
    return _eval(expr, inplace=inplace, **kwargs)
  File "/lib/python3.6/site-packages/pandas-0.19.2-py3.6-linux-x86_64.egg/pandas/computation/eval.py", line 261, in eval
    truediv=truediv)
  File "/lib/python3.6/site-packages/pandas-0.19.2-py3.6-linux-x86_64.egg/pandas/computation/expr.py", line 725, in __init__
    self.terms = self.parse()
  File "/lib/python3.6/site-packages/pandas-0.19.2-py3.6-linux-x86_64.egg/pandas/computation/expr.py", line 742, in parse
    return self._visitor.visit(self.expr)
  File "/lib/python3.6/site-packages/pandas-0.19.2-py3.6-linux-x86_64.egg/pandas/computation/expr.py", line 312, in visit
    return visitor(node, **kwargs)
  File "/lib/python3.6/site-packages/pandas-0.19.2-py3.6-linux-x86_64.egg/pandas/computation/expr.py", line 318, in visit_Module
    return self.visit(expr, **kwargs)
  File "/lib/python3.6/site-packages/pandas-0.19.2-py3.6-linux-x86_64.egg/pandas/computation/expr.py", line 312, in visit
    return visitor(node, **kwargs)
  File "/lib/python3.6/site-packages/pandas-0.19.2-py3.6-linux-x86_64.egg/pandas/computation/expr.py", line 321, in visit_Expr
    return self.visit(node.value, **kwargs)
  File "/lib/python3.6/site-packages/pandas-0.19.2-py3.6-linux-x86_64.egg/pandas/computation/expr.py", line 311, in visit
    visitor = getattr(self, method)
AttributeError: 'PandasExprVisitor' object has no attribute 'visit_Bytes'
>>>

Problem description

It does not seem possible to assign string values to columns under Python 3.6 (probably under all Python 3.X versions but I've only tried it with 3.6).

On python 2.7.12 the above works fine.

Expected Output

A data frame with all rows in column c set to the string 'foo'.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.0.final.0 python-bits: 64 OS: Linux OS-release: 3.13.0-83-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.19.2 nose: None pip: 9.0.1 setuptools: 34.1.0 Cython: None numpy: 1.12.0 scipy: None statsmodels: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: None tables: None numexpr: 2.6.2 matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None jinja2: None boto: None pandas_datareader: None
jreback commented 7 years ago

Is there a reason you are not using the idiomatic assignment?

In [5]: df['c'] = 'foo'

In [6]: df
Out[6]: 
   a  b    c
0  1  2  foo
1  3  4  foo

I suppose this could be allowed. Note that there is NO benefit to actually doing it in .eval AT ALL. (unlike numeric operations which are accelerated using numexpr).

@chris-b1 IIRC you worked on some of this.

tobgu commented 7 years ago

The example above was just a minimal one to show the problem. The actual code that fails for me takes an expression on another form and transforms it into opertations that are applied to a DataFrame. For this it was very convenient to use eval since it is very straight forward to produce an expression that can be evaluated using eval() given the input. I could produce the same operations in python code (the idiomatic way) it would just require some more code on my side.

This has worked fine for me in python 2.7 but now I'm looking into migrating the application to python 3.6 and this error seemed like a regression to me. That's why I reported it (+ to get some input on alternative ways to achieve the same purpose).

jorisvandenbossche commented 7 years ago

I can confirm that this is working in python 2.7, but not in python 3. So in any case, this inconsistency is a bug, and it should probably just work for both?

@tobgu And thanks for reporting!