Open ofajardo opened 4 years ago
Even leaving aside the styling, there are two things I find interesting with this issue:
(bug) results='asis'
preserves the quotes of any Python output. This is making unnecessarily complicated to create HTML or Markdown directly from Python. These are the "commas" @ofajardo is seeing.
(feature request (involving knitr
?)) For a lot of pandas.DataFrame
output, either of the following would often be better than the raw printing:
{python, results='asis'} df = ...; df.to_html()
(assuming 1. is corrected){python, results='asis'} df = ...; df.to_markdown()
(assuming 1. is corrected){python} df = ...
+ {r} py$df
(cleanest result when no multi-index)I could see this getting much cleaner and customizable through an option somewhere, e.g. pandas.df.output
being something like "repr"
(default), "html"
, "markdown"
or "r"
.
Did something change to align with this request? I can't get my pandas dataframes to just print output anymore in my markdown files. It always converts int to an HTML table unless I wrap a print()
around it.
It would be great if pandas data frames were shown nicely in Rmarkdown (R notebooks) same as they appear on Jupyter notebooks (or better, with an indicator of a datatype for each column). The only reason I don't use Rstudio for python is because I am not able to see the full data frames - not scrollable to left and right. This simple feature is very important for data exploration.
Would it be possible to change the class of pandas DataFrame returned from python and have some adapted methods for printing ?
When we do
```{python, echo = FALSE}
df = pd.DataFrame(
{'size': [1.,1.5,1],
'weight' : [3, 5, 2.5]
},
index = ['cat', 'dog', 'koala']
)
```
We end up with an object of class data.frame
```{r}
class(py$df)
# [1] "data.frame"
```
With an additional class, let's say dataframe.pandas
, this would probably be easier to add some printing methods (e.g. print.dataframe.pandas.default
, print.dataframe.pandas.html
, print.dataframe.pandas.markdown
) that would mimic, at R
level (which would give R Markdown users more control on the output) the behavior of df.to_html
or df.to_markdown
.
If I understand correctly, this is an MRE:
---
title: "Pandas Printing"
author: "Kevin Ushey"
date: "`r Sys.Date()`"
output: html_document
---
```{r}
library(reticulate)
use_virtualenv("r-reticulate", required = TRUE)
py_install("pandas")
import pandas as pd
data = {
'size': [1., 1.5, 1],
'weight': [3, 5, 2.5]
}
pd.DataFrame(data, index = ['cat', 'dog', 'koala'])
When this document is rendered via `rmarkdown::render()`, you see:
<img width="577" alt="Screen Shot 2022-12-07 at 9 46 07 AM" src="https://user-images.githubusercontent.com/1976582/206067356-a7bc028e-d482-401e-9188-554a1ef5d128.png">
and so you don't get the nice HTML rendering for the Pandas DataFrame you might've hoped for.
This is where Pandas DataFrames get handled by the reticulate Python engine:
Note that we don't do anything here; we just use the captured (default) print style for the DataFrame. We considered using the to_html()
method in the past, but the rendered HTML is pretty bare-bones and ugly.
I'm not exactly sure what Jupyter is doing here when rendering DataFrames; presumedly they're using their own tooling for rendering to HTML? Or maybe they're using to_markdown()
and letting the Markdown rendered produce a nice table?
Thanks @kevinushey for your detailed answer. In my case, moving to quarto
solved the problem since, behind the stage, this means moving to juypter engine. I guess quarto
now solves most of the cases expressed in this issue. The issue only remains for people mixing R
and python
in quarto
or R Markdown
If it can help, in the past, jupyter
was using this css to style the table. However, I have not been able to locate this styling in current jupyter version.
I don't know how exactly Jupyter does it, but their output is equivalent to Display(Markdown(df.to_markdown()))
(or whatever the IPython classes are). So I think that if reticulate could know that it's running inside knitr and output markdown in that case, then the style would match that of Jupyter.
That would mean, in turn, that quarto gets df printing behavior that is consistent across engines (which is the cause of our upstream issue)
As this came up again on Quarto side, I looked into this a bit. Here are some thoughts and insights
results='asis' preserves the quotes of any Python output. This is making unnecessarily complicated to create HTML or Markdown directly from Python.
Pandas's to_html()
or to_markdown()
method will create a string representation of the DataFrame. But it requires an extra step to print it correctly for output asis
in knitr. See cat()
example to do it at R printing step.
Using IPython and its display solution like HTML()
and Markdown()
helps to format correctly the output. I believe this what will happen in Jupyter
We considered using the to_html() method in the past, but the rendered HTML is pretty bare-bones and ugly.
Regarding this, I believe this is a matter of CSS. We do a specific processing to add Bootstrap style to Pandoc's table, but it seems this does not catch tables output from Pandas. So it would need a tweak.
Adding classes to to_html()
is also an option - especially when we know we are in Bootstrap document. See example below.
Also Quarto is clever on this, because it will parse HTML table by default and do same processing than Markdown table. So style is applied (as any Pandoc table).
Or maybe they're using to_markdown() and letting the Markdown rendered produce a nice table?
So I think that if reticulate could know that it's running inside knitr and output markdown in that case, then the style would match that of Jupyter.
In that case, no styling problem because indeed in R Markdown or Quarto, it will be Pandoc tables and some style are applied to it based on Bootstrap.
Quarto and R Markdown will do different styling, but at the end this is a matter of printing method to do at knitr step. Currently it is default priting, but it could be improved. AFAIU Jupyter (or nbclient or anything in the stack) registers some representation like text/html
, text/markdown
or text/latex
and choose the one to use depending on the output format. At least Quarto leverages that from Jupyter output.
reticulate could do something similar to send information to knitr or do the choice itself based on knitr::pandoc_to()
outputs. Easier with Quarto as outputing Markdown tables is the easiest because Quarto will do its processing and styling.
Documenting how to explicitly style a Pandas table using HTML(df.to_html()
) could also be documented as this would be the way (with results: asis
to do it explicitly with knitr).
this would probably be easier to add some printing methods (e.g. print.dataframe.pandas.default, print.dataframe.pandas.html, print.dataframe.pandas.markdown)
Going through this idea is also a good option for R Markdown.
@kevinushey @t-kalinowski hopes this helps. Happy to help make this better. We would love to have Jupyter and Knitr output for Python to be equivalent in Quarto ! (part of https://github.com/quarto-dev/quarto-cli/issues/3457)
Here are some tests I did with the rendering and different options with R Markdown https://rpubs.com/cderv/reticulate-rmarkdown-pandas-table-outputs
And same document in Quarto https://rpubs.com/cderv/reticulate-quarto-pandas-table-outputs
Note that I understand now reticulate is catching Pandas DataFrame before any _repr_html_
or to_html
can be used.
https://github.com/rstudio/reticulate/blob/a1d7f7f573f652212bc2c72c39317340e6d8b511/R/knitr-engine.R#L576-L580
Regarding https://github.com/quarto-dev/quarto-cli/issues/3457, if the _repr_html
method was called we would get the same output as in Jupyter with the raw HTML table produced , and Quarto would handle them the same.
I confirm that removing else if (inherits(value, "pandas.core.frame.DataFrame"))
do get us the same output in Quarto than with Jupyter. Though as discussed before, in R Markdown it would require some additional CSS or processing to add the bootstrap class for table like it is done in R Markdown for Pandoc's table (and what Quarto is doing also)
I'm leaning towards changing reticulate to produce the Markdown representation when running trough knitr, witht this change table would be displayed like this in RMarkdown
and
It would still not look exactly the same as in Quarto + Jupyter Engine, which is displayed like this:
The pro of this approach is that it only requires changing reticulate and no need for special handling from RMarkdown which I think can be tricky to coordinate. Do you think this a reasonable approach @cderv?
The pro of this approach is that it only requires changing reticulate and no need for special handling from RMarkdown which I think can be tricky to coordinate.
About this, I don't think anything is needed in rmarkdown or knitr in general for what *reticulate is doing. knitr is a toolbox for custom engine to use, and everything that reticulate does in a knitting context is defined inside reticulate.
knitr only calls eng_python()
when python chunk is seen and reticulate available.
So regarding this printing issue, this is only happening based on how reticulate decided to print content, possily in eng_python_autoprint()
. This function decides when to output HTML or Markdown representation for tables (and does also other choices for other type of output)
Usually any issue reported as knitr issue but relevant to reticulate python engine are to be fixed in reticulate itself.
However, I may be missing something...
I'm leaning towards changing reticulate to produce the Markdown representation when running trough knit
I guess this would be fine to output Markdown table only. Quarto does parse Markdown tables through Pandoc and does a lot. but Quarto does parse also HTML table so it would be fine too (https://quarto.org/docs/authoring/tables.html)
I believe for Jupyter engine, Quarto will select the HTML output as I explained above: https://github.com/rstudio/reticulate/issues/783#issuecomment-1642358265 so possibly the output would be the same.
But regarding styling, this is only a matter of CSS. We can definitely fix that in Quarto to get the same styling.
Hope it helps.
Happy to discuss, help and test as needed.
I have come across this issue lately and think it would be really helpful if reticulate supported rich display of pandas data frames. In addition to what has already been mentioned above and the helpful documents shared by @cderv, I wanted to note that _repr_html_()
respects pandas options such as the max number of rows to display and it also includes the following styling info that is not in to_html()
:
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
I would like to be able to change the display style of a pandas data frame, this code works in Jupyter, would be awesome to get it to work in R markdown. Currently it displays an incomplete version of the html string instead of the nicely formatted html table. Rmarkdown file attached.
dframe.Rmd.zip
Displaying a pandas data frame nicely
OK we have a complicated pandas data frame and we want to show it nicely. Passing it to R and using kable or something like that is not an option because when passing a pandas dataframe with multi-index to R those indexes will dissapear. Let's start by displaying the dataframe:
OK not bad (what are those commas before and after the table btw?), but looks boring. Let's try to beautify with some CSS. OOPS, but the resulting html is not rendered, why?