joshday / OnlineStats.jl

⚡ Single-pass algorithms for statistics
https://joshday.github.io/OnlineStats.jl/latest/
MIT License
838 stars 64 forks source link

Pretty printing is unpretty inside DataFrame #281

Closed schlichtanders closed 1 month ago

schlichtanders commented 7 months ago
using DataFrames, CSV, OnlineStats, Statistics
url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv"
df = DataFrame(CSV.File(download(url)))
combine(df, [:sepal_length, :sepal_width] .=> (a -> fit!(Mean(), a)))

shows the following output on the Julia REPL

1×2 DataFrame
 Row │ sepal_length_function              sepal_width_function              
     │ Mean…                              Mean…                             
─────┼──────────────────────────────────────────────────────────────────────
   1 │ Mean\e[90m: \e[39mn=150\e[90m |\…  Mean\e[90m: \e[39mn=150\e[90m |\…

Similar for other Monoids, like Extrema.

It would be great, if OnlineStats are readable inside a DataFrame (they are kind of useless without).

joshday commented 5 months ago

I think the solution is to change show methods that rely on printstyled to use StyledStrings.jl

joshday commented 5 months ago

Actually this looks to be an upstream issue: https://github.com/ronisbr/PrettyTables.jl/issues/244. Closing here.

ronisbr commented 5 months ago

Hi!

IMHO, there is a problem here on how the objects are printed. All the objects seem to be printed to stdout using colors if it supports. However, this approach is wrong. For example, the object Mean provided the following result when calling with print:

Captura de Tela 2024-05-25 às 10 19 58

Notice that the output is decorated. However, print definition is:

print([io::IO], xs...)

Write to io (or to the default output stream stdout if io is not given)
a canonical (un-decorated) text representation. The representation used
by print includes minimal formatting and tries to avoid Julia-specific
details.

Hence, the output must have no colors, breaklines, etc.

When we are defining a type and want to provide a custom method, we usually add two functions:

function show(io::IO, obj::MyType)
function show(io::IO, mime::MIME"text/plain", obj::MyType)

The first is the fallback used for print. Thus, we must provide an undecorated text representation. The second is used for the decorated version if the IO supports colors (which we must check).

PrettyTables.jl uses print to obtain the text representation of objects. Thus, this function is sending something that we would not expecting given the definition of print. That's why you are seeing this behavior.

However, there is an easy way to circumvent this but uncommon to be used in DataFrames. If you wrap a cell in a AnsiTextCell, PrettyTables.jl will automatically render those ANSI escape sequences. Hence, we need to obtain the string representation of the objects and put them in an AnsiTextCell:

julia> combine(
           df,
           [:sepal_length, :sepal_width] .=> (a -> begin
               AnsiTextCell(sprint(print, fit!(Mean(), a); context = :color => true))
           end)
       )
Captura de Tela 2024-05-25 às 10 29 03

Notice that it will even work with breaklines:

Captura de Tela 2024-05-25 às 10 30 22

However, in this case using a DataFrame is useless because you do not have the original objects, just strings. If you want just to show the information, maybe using PrettyTables.jl directly is better.

joshday commented 5 months ago

@ronisbr Thanks for the details! That helped connect some dots I was missing. Sorry for the noise.

ronisbr commented 5 months ago

There's no problem at all @joshday ! Let me know if I can help.

Off-topic: By the way, I did not know this amazing package! I will start to use it immediately :)