Open aborruso opened 9 months ago
Thanks for opening this issue. I've added the data from your example as an xlsx file, so we can explore a bit!
I think that in general, it seems like there are two challenges in this issue:
Here we read the excel spreadsheet using pandas.
import pandas as pd
df = pd.read_excel("https://github.com/posit-dev/great-tables/files/13821956/example-gt.xlsx")
Property Unnamed: 1 Earth
0 Temperature 1961-1990 min -89.2 °C
1 NaN mean 14 °C
2 NaN max 56.7 °C
Note that the merged rows in the Property column ended up with NaNs in them, and there is an unnamed column. To fix this, we can forward fill and do a rename.
tidy = df.assign(Property = df["Property"].ffill()).rename(columns={"Unnamed: 1": "Measure"})
tidy
Property Measure Earth
0 Temperature 1961-1990 min -89.2 °C
1 Temperature 1961-1990 mean 14 °C
2 Temperature 1961-1990 max 56.7 °C
With the data tidied, we can produce something like a grid table, by setting the rowname_col="Measure"
, and groupname_col="Property"
.
import pandas as pd
from great_tables import GT
df = pd.read_excel("https://github.com/posit-dev/great-tables/files/13821956/example-gt.xlsx")
tidy = df.assign(Property = df["Property"].ffill()).rename(columns={"Unnamed: 1": "Measure"})
gt = GT(tidy, rowname_col="Measure", groupname_col="Property")
gt
Note that it differs a bit from you example output though. I wonder if there's another way this is supported in the R gt library?
Thank you very much @machow very useful and didactic. By also doing it directly in excel, I was somehow already able to.
I like this result of yours and it is very readable, but I would like to see if it is useful (also for this great project of yours), to introduce something like groupname_row
, something that allows you to group rows.
Just a thought.
I wonder if there's another way this is supported in the R gt library?
I do not know if there is a gt way, but I know this other R way created by @Steinthal.
Well, @machow, I’d like to caution you that the example given be @aborruso is quite specific.
What if there’s another set of combined cells for another period again with min/mean/max values and those values shall be presented directly above or below those of the earlier period? Then the arrangement presented or requested by @aborruso would make more sense than placing the period above the values.
What I mean is: There are different possible ways to present data depending on the amount of data and requirements – and a solution should be flexible enough to support them all (or at least most of them).
@Steinthal if I understand correctly, are you saying that something like this...
import pandas as pd
from great_tables import GT
df = pd.DataFrame({
"Property": ["Temp 1961"] * 2 + ["Temp 1962"] * 2,
"Measure": ["min", "max"] * 2,
"Earth": [1, 2, 3, 4]
})
GT(df, rowname_col="Measure", groupname_col="Period")
Isn't ideal because the property values (e.g. Temp 1961) shouldn't go across the full dataframe? I wonder if the gt for R row_group_as_column
argument could make this look a bit better (even if it doesn't provide a general solution for spanning rows)?
Note that above, the groupings (e.g. grp_a
) are their own column, and span rows.
If you were thinking of other cases, seeing some examples might help me get a handle on the important ones to support.
Proposal
Sometimes I have spreadsheets with this kind of table:
When I need to use in quarto, it's difficult to do it, because I do not have a tool to convert in example a xlsx file to this kind of pandoc
grid_tables
. There is no way to keep the merged cells.Using python, a tool to do it could be
great-tables
: I could read the xlsx table using pandas and then usegreat-tables
to render properly an HTML table with rowspan.Thank you to @cderv for the idea.