posit-dev / great-tables

Make awesome display tables using Python.
https://posit-dev.github.io/great-tables/
MIT License
1.81k stars 60 forks source link

table with negative data fails to save as image when using .fmt_number #391

Open Mike-Purtell opened 3 months ago

Mike-Purtell commented 3 months ago

Description

Saving image of a table to png file fails when the table has negative values, and .fmt_number is used.

Reproducible example - Verified on complex use cases, and the simple example posted here. Notice that the file extension is .txt, please change to .py or paste into a notebook to run this code.

gt_bug_2024_07_03_MP.txt

Development environment

Win11, great_tables 0.9.0, python 3.11.5 with Anaconda/Jupyter Lab, polars 0.20.31

Expected result

Expect that table with negative data can use .fmt_number to clean the table, and then can be saved as an image file. This failed. .

machow commented 2 months ago

Hey, thanks for raising--I'm having some trouble viewing the .txt file. Do you mind pasting in the python code directly?

Mike-Purtell commented 2 months ago

Hello Michael,

No problem for me to post the python code, i will do so in a few minutes.

Thank you for looking into this issue.

mike purtell

On Mon, Jul 8, 2024 at 11:47 AM Michael Chow @.***> wrote:

Hey, thanks for raising--I'm having some trouble viewing the .txt file. Do you mind pasting in the python code directly?

— Reply to this email directly, view it on GitHub https://github.com/posit-dev/great-tables/issues/391#issuecomment-2214929961, or unsubscribe https://github.com/notifications/unsubscribe-auth/AU4PPQHQC5GGVEFTVW7FTTLZLLNERAVCNFSM6AAAAABKKLR2RGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJUHEZDSOJWGE . You are receiving this because you authored the thread.Message ID: @.***>

Mike-Purtell commented 2 months ago

Here is the python code I wrote to demonstrate the reported bug:

import great_tables from great_tables import GT import polars as pl def save_gt(df, filename): my_gt = (GT(df).tab_header(title = f'{filename}', subtitle = f'subtitle')

    # TO TEST THIS BUG, RUN THIS CODE WITH and WITHOUT .fmt_number 
    # save table to image fails when .fmt_number with negative values is used
    .fmt_number(
        columns=df.columns,
        decimals=1,
        use_seps=True, 
        sep_mark=','
        )
)
try:
    my_gt.save(filename, window_size=(6, 6))
    print(f'\n ###########  SUCCESSFULLY WROTE {filename}  ###########\n')
except:
    print(f'\n ###########  FAILED TO WRITE {filename}  ###########\n')
return

df_pos = pl.DataFrame( { 'A': [x for x in list(range(3))], 'B': [x0.5 for x in list(range(3))], 'C': [x01.5 for x in list(range(3))], } )

make df_neg by multiplying all values of df_pos by -1

df_neg = df_pos.with_columns(pl.all()*pl.lit(-1)) display(df_neg, df_pos) save_gt(df_neg, 'df_neg.png') save_gt(df_pos, 'df_pos.png')

jrycw commented 2 months ago

Hello, I reformatted the code to make it easier to read on GitHub. Hope this helps!

By the way, it seems that the display import is missing. I suspect we need to add from IPython.display import display at the top.

import great_tables
import polars as pl
from great_tables import GT

def save_gt(df, filename):
    my_gt = (
        GT(df).tab_header(title=f"{filename}", subtitle=f"subtitle")
        # TO TEST THIS BUG, RUN THIS CODE WITH and WITHOUT .fmt_number
        # save table to image fails when .fmt_number with negative values is used
        .fmt_number(columns=df.columns, decimals=1, use_seps=True, sep_mark=",")
    )
    try:
        my_gt.save(filename, window_size=(6, 6))
        print(f"\n ###########  SUCCESSFULLY WROTE {filename}  ###########\n")
    except:
        print(f"\n ###########  FAILED TO WRITE {filename}  ###########\n")
    return

df_pos = pl.DataFrame(
    {
        "A": [x for x in list(range(3))],
        "B": [x * 0.5 for x in list(range(3))],
        "C": [x * 01.5 for x in list(range(3))],
    }
)

# make df_neg by multiplying all values of df_pos by -1
df_neg = df_pos.with_columns(pl.all() * pl.lit(-1))
display(df_neg, df_pos)
save_gt(df_neg, "df_neg.png")
save_gt(df_pos, "df_pos.png")
Mike-Purtell commented 2 months ago

Thank you for reformatting of the python code. Not sure how I get away without using from IPython.display import display. Might be automatically imported by my anaconda environment or might be running the native python display command. Thank you for working on this issue, greatly appreciated, and if I can help in any way please don't hesitate to ask.

Mike-Purtell commented 2 months ago

Thank you for releasing 0.10. I ran the test case submitted and it worked, very happy about that. On my production code, I still have cannot format tables with negative values. My error message indicates that I have an issue with the use of UTF-16 coding for the minus sign, which is represented as 0x2212. In polars, I tried to cast as UTF-8, then back to Float64, still have the issue. I also tried multiplying all values by -1 twice to see if this operation would return with an acceptable minus sign, also to no avail. I will see if I can produce a usable work-around for now.

Mike-Purtell commented 2 months ago

great_tables 0.10.0 has issues with .fmt_number. Verified using python 3.11.9, polars 1.1.0. Verified with anaconda/spyder, and with a python notebook in jupyter lab. Short python script (18 lines) attached as txt file.

A workaround is to have polars do the rounding, instead of great tables/ .fmt_number. This work around only applies to rounding, does not cover other features of .fmt_number such as thousands commas.

great_table_fmt_number_2024_07_13.txt

machow commented 2 months ago

Thanks for looking into this (and to @jrycw for the clean up!). I'm having some trouble reproducing :/ . Based on the examples, I ran the code below, but did not hit an error.

import polars as pl
from great_tables import GT
from IPython.display import display

df_pos = pl.DataFrame(
    {
        "A": [x for x in list(range(3))],
        "B": [x * 0.5 for x in list(range(3))],
        "C": [x * 01.5 for x in list(range(3))],
    }
)

# make df_neg by multiplying all values of df_pos by -1
df_neg = df_pos.with_columns(pl.all() * pl.lit(-1))
display(df_neg, df_pos)
(
    GT(df_neg)
    .tab_header(title="a", subtitle="b")
    .fmt_number(columns=df_neg.columns, decimals=1, use_seps=True, sep_mark=",")
    .save("test.png", window_size=(6,6))
)

Do you mind pasting in the traceback for the error (or the error name)? I'm a bit stumped on what might cause saving a table to fail when formatting negative numbers... 😵

Mike-Purtell commented 2 months ago

Hi Michael,

Please try running this code with .fmt_number commented out (works for me, great_table is saved to Random.png with many digits). Then run it again after uncommenting .fmt_number. That is where I get this errors:

UnicodeEncodeError: 'charmap' codec can't encode character '\u2212' in position 7431: character maps to

In my work usage, all of my data is read from csv files, so I thought adding Utf8 decoding to polars scan_csv would do the trick. But this test case which generates the data organically shows that csv endoding is not the issue.

import random, polars as pl

from great_tables import GT

random.seed(42)

col_1 = [random.uniform(-1.0, 1.0) for a in list(range(7))]

col_2 = [random.uniform(-1.0, 1.0) for a in list(range(7))]

df = pl.DataFrame({'COL_1': col_1,'COL_2': col_2})

print(df.head(7))

my_gt = (

GT(df)

.tab_header(title = 'Positive, Negative Cosine') 

# Test with.fmt_number invoked, and with .fmt_number commented out

# .fmt_number(columns=['COL_1', 'COL_2'], decimals=3)

)

.save fails when great_table .fmt_number was used

my_gt.save('Random.png', window_size=(6, 6))

In the case of .fmt number, I workaround it by using polars to do the rounding, but would like to use .fmt_number for thousands columns and other reasons.

Thank you for working on this, I really enjoy great_tables.

Mike Purtell

From: Michael Chow @.> Sent: Monday, July 15, 2024 6:10 AM To: posit-dev/great-tables @.> Cc: Michael Purtell @.>; Author @.> Subject: Re: [posit-dev/great-tables] table with negative data fails to save as image when using .fmt_number (Issue #391)

Thanks for looking into this (and to @jrycw https://github.com/jrycw for the clean up!). I'm having some trouble reproducing :/ . Based on the examples, I ran the code below, but did not hit an error.

import polars as pl from great_tables import GT from IPython.display import display

df_pos = pl.DataFrame( { "A": [x for x in list(range(3))], "B": [x 0.5 for x in list(range(3))], "C": [x 01.5 for x in list(range(3))], } )

make df_neg by multiplying all values of df_pos by -1

df_neg = df_pos.with_columns(pl.all() * pl.lit(-1)) display(df_neg, df_pos) ( GT(df_neg) .tab_header(title="a", subtitle="b") .fmt_number(columns=df_neg.columns, decimals=1, use_seps=True, sep_mark=",") .save("test.png", window_size=(6,6)) )

Do you mind pasting in the traceback for the error (or the error name)? I'm a bit stumped on what might cause saving a table to fail when formatting negative numbers... 😵

— Reply to this email directly, view it on GitHub https://github.com/posit-dev/great-tables/issues/391#issuecomment-2228468706 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AU4PPQBBKCRAJELLWRPY2J3ZMPCYXAVCNFSM6AAAAABKKLR2RGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRYGQ3DQNZQGY . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AU4PPQBXWFDQLKPRBUQPRKTZMPCYXA5CNFSM6AAAAABKKLR2RGWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUE2O56E.gif Message ID: @. @.> >

Mike-Purtell commented 2 months ago

Here is just the code from previous post great_table_fmt_number_2024_07_13.txt

jrycw commented 2 months ago

I'm running on Windows 11 as well and cannot reproduce the error with or without .fmt_number(). However, I suspect the error may stem from these lines, which deal with the minus sign.

import random

import polars as pl
from great_tables import GT

random.seed(42)
col_1 = [random.uniform(-1.0, 1.0) for a in list(range(7))]
col_2 = [random.uniform(-1.0, 1.0) for a in list(range(7))]
df = pl.DataFrame({"COL_1": col_1, "COL_2": col_2})

print(df.head(7))
my_gt = (
    GT(df).tab_header(title="Positive, Negative Cosine")
    # Test with.fmt_number invoked, and with .fmt_number commented out
    # .fmt_number(columns=['COL_1', 'COL_2'], decimals=3)
)

# .save fails when great_table .fmt_number was used
my_gt.save("Random.png", window_size=(6, 6))
Mike-Purtell commented 2 months ago

I ran this code on my personal machine and my work PC, both running Win11, with Anaconda/Spyder, great_tables 0.10.0. I get the same error in both cases when I include .fmt_number. The error message indicates unable to encode \u2212, which is UTF-16. Can the lines that deal with negative values be enhanced to support UTF-16, or to cast the negative sign to an equivalent UTF-8 code? Here is the error message: UnicodeEncodeError: 'charmap' codec can't encode character '\u2212' in position 7431: character maps to

jrycw commented 2 months ago

Another possible fix would be to set the encoding to UTF-8 while writing in GT.save() and related helper functions.

machow commented 2 months ago

Ah, thanks for surfacing! That bit of code definitely looks like the issue, and encoding seems like it should resolve 😓

rmathur-tg commented 1 month ago

any update on this one? running into the same issue

Mike-Purtell commented 1 month ago

No update as far as I know. Last release was 0.10 on July 8, so maybe something in the works. I must say that even with these early adopter issues, the output I produce with great tables has helped me so much with purchase requests and engineering presentations. GT works great with polars, using pl.concat_list to make make nanoplots of horizontal data across the columns.

machow commented 1 month ago

Hey, sorry for the wait -- we're just wrapping up work for posit's yearly conference, and should be able to get to these kinds of issues next week!