pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.91k stars 18.03k forks source link

BUG: formatters argument to DataFrame.to_latex() is broken #6052

Open shoyer opened 10 years ago

shoyer commented 10 years ago

It appears that neither the formatters nor float_format arguments to DataFrame.to_latex work if changed from their default values. Both raise the same exception: AttributeError: 'numpy.float64' object has no attribute 'decode'.

Note: neither of these arguments has test coverage in pandas/tests/test_format.py

My test script:

In [1]: from pandas.util.print_versions import show_versions

In [2]: show_versions()

INSTALLED VERSIONS
------------------
Python: 2.7.5.final.0
OS: Darwin
Release: 13.0.0
Processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.13.0
Cython: 0.19.2
Numpy: 1.8.0
Scipy: 0.13.1
statsmodels: Not installed
    patsy: 0.2.1
scikits.timeseries: Not installed
dateutil: 2.2
pytz: 2013.9
bottleneck: 0.7.0
PyTables: Not Installed
    numexpr: 2.1
matplotlib: 1.3.1
openpyxl: Not installed
xlrd: Not installed
xlwt: Not installed
xlsxwriter: Not installed
sqlalchemy: 0.7.5
lxml: Not installed
bs4: Not installed
html5lib: Not installed
bigquery: Not installed
apiclient: 1.0c1

In [3]: import pandas as pd

In [4]: df = pd.DataFrame({'a': [1.0, 2.0]})

In [5]: df.to_latex()
Out[5]: u'\\begin{tabular}{lr}\n\\toprule\n{} &  a \\\\\n\\midrule\n0 &  1 \\\\\n1 &  2 \\\\\n\\bottomrule\n\\end{tabular}\n'

In [6]: df.to_latex(float_format=lambda x: x)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-6-c932ce2a22b5> in <module>()
----> 1 df.to_latex(float_format=lambda x: x)

/Users/shoyer/dev/climatology-research/library/python/target/climatology/lib/python2.7/site-packages/pandas/core/frame.pyc in to_latex(self, buf, columns, col_space, colSpace, header, index, na_rep, formatters, float_format, sparsify, index_names, bold_rows, force_unicode)
   1381                                            sparsify=sparsify,
   1382                                            index_names=index_names)
-> 1383         formatter.to_latex()
   1384
   1385         if buf is None:

/Users/shoyer/dev/climatology-research/library/python/target/climatology/lib/python2.7/site-packages/pandas/core/format.pyc in to_latex(self, force_unicode, column_format)
    446             strcols = [[info_line]]
    447         else:
--> 448             strcols = self._to_str_columns()
    449
    450         if column_format is None:

/Users/shoyer/dev/climatology-research/library/python/target/climatology/lib/python2.7/site-packages/pandas/core/format.pyc in _to_str_columns(self)
    325             stringified = []
    326             for i, c in enumerate(cols_to_show):
--> 327                 fmt_values = self._format_col(i)
    328                 cheader = str_columns[i]
    329

/Users/shoyer/dev/climatology-research/library/python/target/climatology/lib/python2.7/site-packages/pandas/core/format.pyc in _format_col(self, i)
    490             (self.frame.iloc[:self.max_rows_displayed, i]).get_values(),
    491             formatter, float_format=self.float_format, na_rep=self.na_rep,
--> 492             space=self.col_space
    493         )
    494

/Users/shoyer/dev/climatology-research/library/python/target/climatology/lib/python2.7/site-packages/pandas/core/format.pyc in format_array(values, formatter, float_format, na_rep, digits, space, justify)
   1615                         justify=justify)
   1616
-> 1617     return fmt_obj.get_result()
   1618
   1619

/Users/shoyer/dev/climatology-research/library/python/target/climatology/lib/python2.7/site-packages/pandas/core/format.pyc in get_result(self)
   1733                 fmt_values = self._format_with(fmt_str)
   1734
-> 1735         return _make_fixed_width(fmt_values, self.justify)
   1736
   1737

/Users/shoyer/dev/climatology-research/library/python/target/climatology/lib/python2.7/site-packages/pandas/core/format.pyc in _make_fixed_width(strings, justify, minimum, truncated)
   1795     _strlen = _strlen_func()
   1796
-> 1797     max_len = np.max([_strlen(x) for x in strings])
   1798
   1799     if minimum is not None:

/Users/shoyer/dev/climatology-research/library/python/target/climatology/lib/python2.7/site-packages/pandas/core/format.pyc in _strlen(x)
    227         def _strlen(x):
    228             try:
--> 229                 return len(x.decode(encoding))
    230             except UnicodeError:
    231                 return len(x)

AttributeError: 'numpy.float64' object has no attribute 'decode'
jreback commented 10 years ago

let's call this a bug then!

jreback commented 10 years ago

feel free to do a PR for this!

shoyer commented 10 years ago

Some investigation reveals that the issue is that formatter function must return a string. That seems like a reasonable choice, although it isn't what I guessed (I tried float_format=round). So this will actually just be a doc fix -- expect a PR for that shortly.

jreback commented 10 years ago

I think you can do: float_format=lambda x: round(x) or even better: float_float='%.0f'

shoyer commented 10 years ago

@jreback I agree, it would be nice if those did work, but unfortunately neither of them do currently. That would definitely be worth doing, but my doc-fix PR should clarify this for now.

asapsmc commented 2 years ago

Is there any example on how to use the formatters field?

NumberPiOso commented 2 years ago

take

NumberPiOso commented 2 years ago

The reasons why this issue was not closed are mentioned in https://github.com/pandas-dev/pandas/pull/6054#issuecomment-33432118

Disagree this "fixes" 6052, but it does help! I wonder if there is a good place in the docs to add more about this?

I think probably we should catch bad input and re-raise it with better message... accepting string e.g. '%0.f' would also be nice.

Documentation now states:

  float_format : one-parameter function or str, optional, default None
            Formatter for floating point numbers. For example
            ``float_format="%.2f"`` and ``float_format="{{:0.2f}}".format`` will
            both result in 0.1234 being formatted as 0.12.

Example

import pandas as pd
df = pd.DataFrame({'a': [1.0, 2.0]})
print(df.to_latex(float_format='%0.f'))

FutureWarning: In future versions `DataFrame.to_latex` is expected to utilise the base implementation of `Styler.to_latex` for formatting and rendering. The arguments signature may therefore change. It is recommended instead to use `DataFrame.style.to_latex` which also contains additional functionality.
  print(df.to_latex(float_format='%0.f'))
\begin{tabular}{lr}
\toprule
{} &  a \\
\midrule
0 &  1 \\
1 &  2 \\
\bottomrule
\end{tabular}

Which solves the part

accepting string e.g. '%0.f' would also be

We are just missing now the other part

we should catch bad input and re-raise it with better message

But due to the future warning, the usage of this function is going to change dramatically, so I will not work on this.