Open Mike-Purtell opened 3 months ago
Hey, thanks for raising--I'm having some trouble viewing the .txt file. Do you mind pasting in the python code directly?
Hello Michael,
No problem for me to post the python code, i will do so in a few minutes.
Thank you for looking into this issue.
mike purtell
On Mon, Jul 8, 2024 at 11:47 AM Michael Chow @.***> wrote:
Hey, thanks for raising--I'm having some trouble viewing the .txt file. Do you mind pasting in the python code directly?
— Reply to this email directly, view it on GitHub https://github.com/posit-dev/great-tables/issues/391#issuecomment-2214929961, or unsubscribe https://github.com/notifications/unsubscribe-auth/AU4PPQHQC5GGVEFTVW7FTTLZLLNERAVCNFSM6AAAAABKKLR2RGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJUHEZDSOJWGE . You are receiving this because you authored the thread.Message ID: @.***>
Here is the python code I wrote to demonstrate the reported bug:
import great_tables from great_tables import GT import polars as pl def save_gt(df, filename): my_gt = (GT(df).tab_header(title = f'{filename}', subtitle = f'subtitle')
# TO TEST THIS BUG, RUN THIS CODE WITH and WITHOUT .fmt_number
# save table to image fails when .fmt_number with negative values is used
.fmt_number(
columns=df.columns,
decimals=1,
use_seps=True,
sep_mark=','
)
)
try:
my_gt.save(filename, window_size=(6, 6))
print(f'\n ########### SUCCESSFULLY WROTE {filename} ###########\n')
except:
print(f'\n ########### FAILED TO WRITE {filename} ###########\n')
return
df_pos = pl.DataFrame( { 'A': [x for x in list(range(3))], 'B': [x0.5 for x in list(range(3))], 'C': [x01.5 for x in list(range(3))], } )
df_neg = df_pos.with_columns(pl.all()*pl.lit(-1)) display(df_neg, df_pos) save_gt(df_neg, 'df_neg.png') save_gt(df_pos, 'df_pos.png')
Hello, I reformatted the code to make it easier to read on GitHub. Hope this helps!
By the way, it seems that the display
import is missing. I suspect we need to add from IPython.display import display
at the top.
import great_tables
import polars as pl
from great_tables import GT
def save_gt(df, filename):
my_gt = (
GT(df).tab_header(title=f"{filename}", subtitle=f"subtitle")
# TO TEST THIS BUG, RUN THIS CODE WITH and WITHOUT .fmt_number
# save table to image fails when .fmt_number with negative values is used
.fmt_number(columns=df.columns, decimals=1, use_seps=True, sep_mark=",")
)
try:
my_gt.save(filename, window_size=(6, 6))
print(f"\n ########### SUCCESSFULLY WROTE {filename} ###########\n")
except:
print(f"\n ########### FAILED TO WRITE {filename} ###########\n")
return
df_pos = pl.DataFrame(
{
"A": [x for x in list(range(3))],
"B": [x * 0.5 for x in list(range(3))],
"C": [x * 01.5 for x in list(range(3))],
}
)
# make df_neg by multiplying all values of df_pos by -1
df_neg = df_pos.with_columns(pl.all() * pl.lit(-1))
display(df_neg, df_pos)
save_gt(df_neg, "df_neg.png")
save_gt(df_pos, "df_pos.png")
Thank you for reformatting of the python code. Not sure how I get away without using from IPython.display import display. Might be automatically imported by my anaconda environment or might be running the native python display command. Thank you for working on this issue, greatly appreciated, and if I can help in any way please don't hesitate to ask.
Thank you for releasing 0.10. I ran the test case submitted and it worked, very happy about that. On my production code, I still have cannot format tables with negative values. My error message indicates that I have an issue with the use of UTF-16 coding for the minus sign, which is represented as 0x2212. In polars, I tried to cast as UTF-8, then back to Float64, still have the issue. I also tried multiplying all values by -1 twice to see if this operation would return with an acceptable minus sign, also to no avail. I will see if I can produce a usable work-around for now.
great_tables 0.10.0 has issues with .fmt_number. Verified using python 3.11.9, polars 1.1.0. Verified with anaconda/spyder, and with a python notebook in jupyter lab. Short python script (18 lines) attached as txt file.
A workaround is to have polars do the rounding, instead of great tables/ .fmt_number. This work around only applies to rounding, does not cover other features of .fmt_number such as thousands commas.
Thanks for looking into this (and to @jrycw for the clean up!). I'm having some trouble reproducing :/ . Based on the examples, I ran the code below, but did not hit an error.
import polars as pl
from great_tables import GT
from IPython.display import display
df_pos = pl.DataFrame(
{
"A": [x for x in list(range(3))],
"B": [x * 0.5 for x in list(range(3))],
"C": [x * 01.5 for x in list(range(3))],
}
)
# make df_neg by multiplying all values of df_pos by -1
df_neg = df_pos.with_columns(pl.all() * pl.lit(-1))
display(df_neg, df_pos)
(
GT(df_neg)
.tab_header(title="a", subtitle="b")
.fmt_number(columns=df_neg.columns, decimals=1, use_seps=True, sep_mark=",")
.save("test.png", window_size=(6,6))
)
Do you mind pasting in the traceback for the error (or the error name)? I'm a bit stumped on what might cause saving a table to fail when formatting negative numbers... 😵
Hi Michael,
Please try running this code with .fmt_number commented out (works for me, great_table is saved to Random.png with many digits). Then run it again after uncommenting .fmt_number. That is where I get this errors:
UnicodeEncodeError: 'charmap' codec can't encode character '\u2212' in position 7431: character maps to
In my work usage, all of my data is read from csv files, so I thought adding Utf8 decoding to polars scan_csv would do the trick. But this test case which generates the data organically shows that csv endoding is not the issue.
import random, polars as pl
from great_tables import GT
random.seed(42)
col_1 = [random.uniform(-1.0, 1.0) for a in list(range(7))]
col_2 = [random.uniform(-1.0, 1.0) for a in list(range(7))]
df = pl.DataFrame({'COL_1': col_1,'COL_2': col_2})
print(df.head(7))
my_gt = (
GT(df)
.tab_header(title = 'Positive, Negative Cosine')
# Test with.fmt_number invoked, and with .fmt_number commented out
# .fmt_number(columns=['COL_1', 'COL_2'], decimals=3)
)
my_gt.save('Random.png', window_size=(6, 6))
In the case of .fmt number, I workaround it by using polars to do the rounding, but would like to use .fmt_number for thousands columns and other reasons.
Thank you for working on this, I really enjoy great_tables.
Mike Purtell
From: Michael Chow @.> Sent: Monday, July 15, 2024 6:10 AM To: posit-dev/great-tables @.> Cc: Michael Purtell @.>; Author @.> Subject: Re: [posit-dev/great-tables] table with negative data fails to save as image when using .fmt_number (Issue #391)
Thanks for looking into this (and to @jrycw https://github.com/jrycw for the clean up!). I'm having some trouble reproducing :/ . Based on the examples, I ran the code below, but did not hit an error.
import polars as pl from great_tables import GT from IPython.display import display
df_pos = pl.DataFrame( { "A": [x for x in list(range(3))], "B": [x 0.5 for x in list(range(3))], "C": [x 01.5 for x in list(range(3))], } )
df_neg = df_pos.with_columns(pl.all() * pl.lit(-1)) display(df_neg, df_pos) ( GT(df_neg) .tab_header(title="a", subtitle="b") .fmt_number(columns=df_neg.columns, decimals=1, use_seps=True, sep_mark=",") .save("test.png", window_size=(6,6)) )
Do you mind pasting in the traceback for the error (or the error name)? I'm a bit stumped on what might cause saving a table to fail when formatting negative numbers... 😵
— Reply to this email directly, view it on GitHub https://github.com/posit-dev/great-tables/issues/391#issuecomment-2228468706 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AU4PPQBBKCRAJELLWRPY2J3ZMPCYXAVCNFSM6AAAAABKKLR2RGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRYGQ3DQNZQGY . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AU4PPQBXWFDQLKPRBUQPRKTZMPCYXA5CNFSM6AAAAABKKLR2RGWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUE2O56E.gif Message ID: @. @.> >
Here is just the code from previous post great_table_fmt_number_2024_07_13.txt
I'm running on Windows 11 as well and cannot reproduce the error with or without .fmt_number()
. However, I suspect the error may stem from these lines, which deal with the minus sign.
import random
import polars as pl
from great_tables import GT
random.seed(42)
col_1 = [random.uniform(-1.0, 1.0) for a in list(range(7))]
col_2 = [random.uniform(-1.0, 1.0) for a in list(range(7))]
df = pl.DataFrame({"COL_1": col_1, "COL_2": col_2})
print(df.head(7))
my_gt = (
GT(df).tab_header(title="Positive, Negative Cosine")
# Test with.fmt_number invoked, and with .fmt_number commented out
# .fmt_number(columns=['COL_1', 'COL_2'], decimals=3)
)
# .save fails when great_table .fmt_number was used
my_gt.save("Random.png", window_size=(6, 6))
I ran this code on my personal machine and my work PC, both running Win11, with Anaconda/Spyder, great_tables 0.10.0. I get the same error in both cases when I include .fmt_number. The error message indicates unable to encode \u2212, which is UTF-16. Can the lines that deal with negative values be enhanced to support UTF-16, or to cast the negative sign to an equivalent UTF-8 code? Here is the error message:
UnicodeEncodeError: 'charmap' codec can't encode character '\u2212' in position 7431: character maps to
Another possible fix would be to set the encoding to UTF-8 while writing in GT.save()
and related helper functions.
Ah, thanks for surfacing! That bit of code definitely looks like the issue, and encoding seems like it should resolve 😓
any update on this one? running into the same issue
No update as far as I know. Last release was 0.10 on July 8, so maybe something in the works. I must say that even with these early adopter issues, the output I produce with great tables has helped me so much with purchase requests and engineering presentations. GT works great with polars, using pl.concat_list to make make nanoplots of horizontal data across the columns.
Hey, sorry for the wait -- we're just wrapping up work for posit's yearly conference, and should be able to get to these kinds of issues next week!
Description
Saving image of a table to png file fails when the table has negative values, and .fmt_number is used.
Reproducible example - Verified on complex use cases, and the simple example posted here. Notice that the file extension is .txt, please change to .py or paste into a notebook to run this code.
gt_bug_2024_07_03_MP.txt
Development environment
Win11, great_tables 0.9.0, python 3.11.5 with Anaconda/Jupyter Lab, polars 0.20.31
Expected result
Expect that table with negative data can use .fmt_number to clean the table, and then can be saved as an image file. This failed. .