pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.63k stars 17.91k forks source link

UnicodeEncodeError from DataFrame.to_records #11879

Closed kynnjo closed 7 years ago

kynnjo commented 8 years ago

The DataFrame.to_records method fails with a UnicodeEncodeError for some unicode column names.

(This issue is related to https://github.com/pydata/pandas/issues/680. The example below extends the example given in that issue.)

In [322]: df = pandas.DataFrame({u'c/\u03c3':[1,2,3]})

In [323]: df
Out[323]: 
   c/σ
0    1
1    2
2    3

In [324]: df.to_records()
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-324-6d3142e97d2d> in <module>()
----> 1 df.to_records()

/redacted/python2.7/site-packages/pandas/core/frame.pyc in to_records(self, index, convert_datetime64)
   1013             elif index_names[0] is None:
   1014                 index_names = ['index']
-> 1015             names = index_names + lmap(str, self.columns)
   1016         else:
   1017             arrays = [self[c].get_values() for c in self.columns]

UnicodeEncodeError: 'ascii' codec can't encode character u'\u03c3' in position 2: ordinal not in range(128)
jreback commented 8 years ago

you are referring to a VERY old issue FYI. Pls show pd.show_versions(). This a bug in any event so pull-requests are welcome.

this should be: lmap(compat.text_type, self.columns) I think

kynnjo commented 8 years ago

If you can't be bothered to verify the code I posted, then just delete the issue. I don't give a damn.

jreback commented 8 years ago

@kynnjo I did repro right after you posted that's why I marked it as a bug I asked nicely to have you post the diagnostic. I even put what I think the fix is.

we don't appreciate rude behavior. please use respectful language.

kynnjo commented 8 years ago

just delete the issue and we're done

jreback commented 8 years ago

I actually find this a valid issue. thank you for reporting. don't you wish to see pandas improved and others helped?

gliptak commented 8 years ago

This works on current HEAD:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({u'c/\u03c3':[1,2,3]})

In [3]: df
Out[3]: 
   c/σ
0    1
1    2
2    3

In [4]: df.to_records()
Out[4]: 
rec.array([(0, 1), (1, 2), (2, 3)], 
          dtype=[('index', '<i8'), ('c/σ', '<i8')])

Please consider closing.

jreback commented 8 years ago

This fails in py2.

In [1]: df = pandas.DataFrame({u'c/\u03c3':[1,2,3]})

In [2]: df.to_records()
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-2-6d3142e97d2d> in <module>()
----> 1 df.to_records()

/Users/jreback/pandas/pandas/core/frame.pyc in to_records(self, index, convert_datetime64)
   1063             elif index_names[0] is None:
   1064                 index_names = ['index']
-> 1065             names = lmap(str, index_names) + lmap(str, self.columns)
   1066         else:
   1067             arrays = [self[c].get_values() for c in self.columns]

UnicodeEncodeError: 'ascii' codec can't encode character u'\u03c3' in position 2: ordinal not in range(128)