Unicode text table output from pandas dataframe: turns string objects into numbers and changes their representation

thombashi / pytablewriter

pytablewriter is a Python library to write a table in various formats: AsciiDoc / CSV / Elasticsearch / HTML / JavaScript / JSON / LaTeX / LDJSON / LTSV / Markdown / MediaWiki / NumPy / Excel / Pandas / Python / reStructuredText / SQLite / TOML / TSV.

MIT License

605 stars 43 forks source link

This brilliant tool is breaking when rendering a pandas table as text.

The data contains very long numbers stored as strings. The strings contain representations of long decimals and long ints with thousands separators. In the Pandas dataframe, they are stored as objects. When I output the table using a pytablewriter unicode writer, it produces a faithful rendering of the long ints, but it seems to process the strings representing long decimals as though they were numbers, and shows them as though they had been converted from strings into floats, with all the problems of string representation that floats bring: unwanted zeros after the last significant decimal digit on short decimals, and precision too short to show the whole number on long decimals.

For example:

"0.000000000000001" is represented by pytablewriter as "0.000000" "0.001" as "0.001000"

Yet, with long numbers:

"1,000,000,000,000" is represented faithfully as "1,000,000,000,000".

The problem seems to be a general issue with decimals. Notwithstanding the fact that they are held as strings for the very purpose of ensuring their representation is as strings and not numbers, the pytablewriter output table applies different justification to the strings that represent integers and those that represent decimals. The former, it justifies left, the latter it justifies right. So it seems to be treating the strings that contain decimals as though they are numbers, converts them to float and then outputs them as numbers.

It justifies the string "1" to the right with the decimals as well.

@mjb-v9-5-2 Thank you for your feedback.

The problems that you described are fixed for certain values at pytablewriter 0.62.0:

import pandas as pd
import pytablewriter as ptw

writer = ptw.UnicodeTableWriter(
    dataframe=pd.DataFrame(
        {"realnumber": ["0.000000000000001", "0.000000000000002"], "long": ["1,000,000,000,000", "1"]}
    ),
    margin=1,
    column_styles=[
        ptw.style.Style(thousand_separator=","),
        ptw.style.Style(thousand_separator=","),
    ]
)
writer.write_table()

┌───────────────────┬───────────────────┐
│    realnumber     │       long        │
├───────────────────┼───────────────────┤
│ 0.000000000000001 │ 1,000,000,000,000 │
├───────────────────┼───────────────────┤
│ 0.000000000000002 │                 1 │
└───────────────────┴───────────────────┘

However, in the case of mixed decimal place values, the problem still exists as before:

import pandas as pd
import pytablewriter as ptw

writer = ptw.UnicodeTableWriter(
    dataframe=pd.DataFrame(
        {"realnumber": ["0.000000000000001", "0.1"], "long": ["1,000,000,000,000", "1"]}
    ),
    margin=1,
    column_styles=[
        ptw.style.Style(thousand_separator=","),
        ptw.style.Style(thousand_separator=","),
    ]
)
writer.write_table()

┌─────────────┬───────────────────┐
│ realnumber  │       long        │
├─────────────┼───────────────────┤
│ 0.000000000 │ 1,000,000,000,000 │
├─────────────┼───────────────────┤
│ 0.100000000 │                 1 │
└─────────────┴───────────────────┘

I will also fix this in the future version.

thombashi / pytablewriter

Unicode text table output from pandas dataframe: turns string objects into numbers and changes their representation #44