quarto-dev / quarto-cli

Open-source scientific and technical publishing system built on Pandoc.
https://quarto.org
Other
3.95k stars 326 forks source link

DataFrame not render in markdown in Julia #6134

Open sethaxen opened 1 year ago

sethaxen commented 1 year ago

On Quarto v1.3.433, I tried rendering a .qmd file with a cell block that outputs a DataFrames.DataFrame object in Julia. When rendering an HTML or PDF file, the DataFrame is rendered as a table, but when rendering to markdown, it is not rendered at all. There's a minimal repo demonstrating this here: https://github.com/sethaxen/quarto_dataframes_jl_demo

cderv commented 1 year ago

I can reproduce this.

Somehow, when the first table is outputing a LaTeX table in the intermediate .md file pass to Pandoc

The following shows a table when rendered to PDF or HTML but nothing when rendered to markdown

::: {.cell execution_count=2}
``` {.julia .cell-code}
using DataFrames
df = DataFrame(:x => randn(10), :y => randn(10))

::: {.cell-output .cell-output-display execution_count=3} \begin{tabular}{r|cc} & x & y\ \hline & Float64 & Float64\ \hline 1 & 0.0387945 & 0.728907 \ 2 & 0.0640609 & -0.0620356 \ 3 & 0.0834239 & 0.209232 \ 4 & 1.46305 & -0.305209 \ 5 & 0.883393 & 0.61293 \ 6 & -1.27005 & 0.167557 \ 7 & 0.469263 & 0.873955 \ 8 & 0.398598 & -0.243128 \ 9 & -0.210688 & 0.131962 \ 10 & 1.57329 & -1.30073 \ \end{tabular}

::: :::



This is why it gets ignored when converting to a Markdown output with Pandoc... This is suprising 🤔 
cderv commented 1 year ago

So this the information we get from the Jupyter rendering

{
 "cells": [
  {
   "cell_type": "raw",
   "id": "a6f5d7b4",
   "metadata": {},
   "source": [
    "---\n",
    "title: \"DataFrames rendering test\"\n",
    "keep-md: true\n",
    "keep-ipynb: true\n",
    "format: \n",
    "  md:\n",
    "    output-file: test-md.md\n",
    "  html: default\n",
    "---"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "f96792b9",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[32m\u001b[1m  Activating\u001b[22m\u001b[39m project at `C:\\Users\\chris\\Documents\\DEV_OTHER\\DEMOS\\test-quarto`\n"
     ]
    }
   ],
   "source": [
    "#| output: false\n",
    "#| echo: false\n",
    "using Pkg\n",
    "Pkg.activate(\".\")\n",
    "Pkg.instantiate()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f64c9a77",
   "metadata": {},
   "source": [
    "The following shows a table when rendered to PDF or HTML but nothing when rendered to markdown\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "f3ee4485",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><div style = \"float: left;\"><span>10×2 DataFrame</span></div><div style = \"clear: both;\"></div></div><div class = \"data-frame\" style = \"overflow-x: scroll;\"><table class = \"data-frame\" style = \"margin-bottom: 6px;\"><thead><tr class = \"header\"><th class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">Row</th><th style = \"text-align: left;\">x</th><th style = \"text-align: left;\">y</th></tr><tr class = \"subheader headerLastRow\"><th class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\"></th><th title = \"Float64\" style = \"text-align: left;\">Float64</th><th title = \"Float64\" style = \"text-align: left;\">Float64</th></tr></thead><tbody><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">1</td><td style = \"text-align: right;\">0.662702</td><td style = \"text-align: right;\">0.451484</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">2</td><td style = \"text-align: right;\">-1.54828</td><td style = \"text-align: right;\">-1.25003</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">3</td><td style = \"text-align: right;\">-0.0443339</td><td style = \"text-align: right;\">-1.39289</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">4</td><td style = \"text-align: right;\">-1.55107</td><td style = \"text-align: right;\">-1.9189</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">5</td><td style = \"text-align: right;\">0.0755979</td><td style = \"text-align: right;\">-0.58877</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">6</td><td style = \"text-align: right;\">0.748819</td><td style = \"text-align: right;\">-1.87185</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">7</td><td style = \"text-align: right;\">-0.281013</td><td style = \"text-align: right;\">1.54457</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">8</td><td style = \"text-align: right;\">-0.680722</td><td style = \"text-align: right;\">0.664824</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">9</td><td style = \"text-align: right;\">-1.20119</td><td style = \"text-align: right;\">2.64156</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">10</td><td style = \"text-align: right;\">-0.161478</td><td style = \"text-align: right;\">1.11157</td></tr></tbody></table></div>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|cc}\n",
       "\t& x & y\\\\\n",
       "\t\\hline\n",
       "\t& Float64 & Float64\\\\\n",
       "\t\\hline\n",
       "\t1 & 0.662702 & 0.451484 \\\\\n",
       "\t2 & -1.54828 & -1.25003 \\\\\n",
       "\t3 & -0.0443339 & -1.39289 \\\\\n",
       "\t4 & -1.55107 & -1.9189 \\\\\n",
       "\t5 & 0.0755979 & -0.58877 \\\\\n",
       "\t6 & 0.748819 & -1.87185 \\\\\n",
       "\t7 & -0.281013 & 1.54457 \\\\\n",
       "\t8 & -0.680722 & 0.664824 \\\\\n",
       "\t9 & -1.20119 & 2.64156 \\\\\n",
       "\t10 & -0.161478 & 1.11157 \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m10×2 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m x          \u001b[0m\u001b[1m y         \u001b[0m\n",
       "     │\u001b[90m Float64    \u001b[0m\u001b[90m Float64   \u001b[0m\n",
       "─────┼───────────────────────\n",
       "   1 │  0.662702    0.451484\n",
       "   2 │ -1.54828    -1.25003\n",
       "   3 │ -0.0443339  -1.39289\n",
       "   4 │ -1.55107    -1.9189\n",
       "   5 │  0.0755979  -0.58877\n",
       "   6 │  0.748819   -1.87185\n",
       "   7 │ -0.281013    1.54457\n",
       "   8 │ -0.680722    0.664824\n",
       "   9 │ -1.20119     2.64156\n",
       "  10 │ -0.161478    1.11157"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "using DataFrames\n",
    "df = DataFrame(:x => randn(10), :y => randn(10))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f6bb391",
   "metadata": {},
   "source": [
    "But there is a custom `show` method for HTML outputs:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "6805350a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\"<div><div style = \\\"float: left;\\\"><span>10×2 DataFrame</span></div><div style = \\\"clear: both;\\\"></div></div><div class = \\\"data-frame\\\" style = \\\"overflow-x: scroll;\\\"><table class = \\\"data-frame\\\" style = \\\"margin-bottom: 6px;\\\"><thead><tr class = \\\"header\\\"><th class = \\\"rowNumber\" ⋯ 1943 bytes ⋯ \"ht;\\\">-1.20119</td><td style = \\\"text-align: right;\\\">2.64156</td></tr><tr><td class = \\\"rowNumber\\\" style = \\\"font-weight: bold; text-align: right;\\\">10</td><td style = \\\"text-align: right;\\\">-0.161478</td><td style = \\\"text-align: right;\\\">1.11157</td></tr></tbody></table></div>\""
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sprint(show, \"text/html\", df)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "62ee20cf",
   "metadata": {},
   "source": [
    "In fact, if we just wrap the `DataFrame` with an object that has a custom HTML `show` method, it renders in markdown just fine:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "8c684f8f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><div style = \"float: left;\"><span>10×2 DataFrame</span></div><div style = \"clear: both;\"></div></div><div class = \"data-frame\" style = \"overflow-x: scroll;\"><table class = \"data-frame\" style = \"margin-bottom: 6px;\"><thead><tr class = \"header\"><th class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">Row</th><th style = \"text-align: left;\">x</th><th style = \"text-align: left;\">y</th></tr><tr class = \"subheader headerLastRow\"><th class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\"></th><th title = \"Float64\" style = \"text-align: left;\">Float64</th><th title = \"Float64\" style = \"text-align: left;\">Float64</th></tr></thead><tbody><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">1</td><td style = \"text-align: right;\">0.662702</td><td style = \"text-align: right;\">0.451484</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">2</td><td style = \"text-align: right;\">-1.54828</td><td style = \"text-align: right;\">-1.25003</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">3</td><td style = \"text-align: right;\">-0.0443339</td><td style = \"text-align: right;\">-1.39289</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">4</td><td style = \"text-align: right;\">-1.55107</td><td style = \"text-align: right;\">-1.9189</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">5</td><td style = \"text-align: right;\">0.0755979</td><td style = \"text-align: right;\">-0.58877</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">6</td><td style = \"text-align: right;\">0.748819</td><td style = \"text-align: right;\">-1.87185</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">7</td><td style = \"text-align: right;\">-0.281013</td><td style = \"text-align: right;\">1.54457</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">8</td><td style = \"text-align: right;\">-0.680722</td><td style = \"text-align: right;\">0.664824</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">9</td><td style = \"text-align: right;\">-1.20119</td><td style = \"text-align: right;\">2.64156</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">10</td><td style = \"text-align: right;\">-0.161478</td><td style = \"text-align: right;\">1.11157</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "DataFrameWrapper(\u001b[1m10×2 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m x          \u001b[0m\u001b[1m y         \u001b[0m\n",
       "     │\u001b[90m Float64    \u001b[0m\u001b[90m Float64   \u001b[0m\n",
       "─────┼───────────────────────\n",
       "   1 │  0.662702    0.451484\n",
       "   2 │ -1.54828    -1.25003\n",
       "   3 │ -0.0443339  -1.39289\n",
       "   4 │ -1.55107    -1.9189\n",
       "   5 │  0.0755979  -0.58877\n",
       "   6 │  0.748819   -1.87185\n",
       "   7 │ -0.281013    1.54457\n",
       "   8 │ -0.680722    0.664824\n",
       "   9 │ -1.20119     2.64156\n",
       "  10 │ -0.161478    1.11157)"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "struct DataFrameWrapper\n",
    "    df\n",
    "end\n",
    "\n",
    "Base.show(io::IO, mime::MIME\"text/html\", w::DataFrameWrapper) = show(io, mime, w.df)\n",
    "\n",
    "DataFrameWrapper(df)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Julia 1.7.3",
   "language": "julia",
   "name": "julia-1.7"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "1.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}

We can see that the first cell will offer a three representation: html, plain and latex. No text/markdown output.

And then we have specific process that handle some raw LaTeX in that case are they are expected to be Math.

Here is what happens

We are getting the cell content from the above in https://github.com/quarto-dev/quarto-cli/blob/7277a8f738cbad8c9fe42364ba5cb81ed72f5754/src/core/jupyter/jupyter.ts#L1166-L1170

In that process we do the following with the outputs https://github.com/quarto-dev/quarto-cli/blob/7277a8f738cbad8c9fe42364ba5cb81ed72f5754/src/core/jupyter/jupyter.ts#L1193-L1199

So we are calling displayDataWithMarkdownMath() on this toLatex output, and in there we select the output.data["text/latex"] explicitely https://github.com/quarto-dev/quarto-cli/blob/7277a8f738cbad8c9fe42364ba5cb81ed72f5754/src/core/jupyter/display-data.ts#L122-L132

For the last code cell you have, there is only text/html and text/plain, so there is no LaTeX table inserted and so you see the right table result.

I don't know enough about Jupyter, Julia and nbConvert to understand what happens. It seems though that

df = DataFrame(:x => randn(10), :y => randn(10))

won't output text/markdown to include.

Though I guess we should probably not choose the text/latex part, but either the plain or html.

To note that if we do force HTML then we correctly select the HTML version of the results

format: 
   md:
      prefer-html: true

@cscheid @dragonstyle you know Quarto internals for Jupyter a bit better than me. Do you have more insight on what we should do here ? if we do need to do something unless we consider this to be a Julia DataFrames output problem. But really surprising that we select the text/latex data for this output.

Hope it helps

sethaxen commented 1 year ago

To note that if we do force HTML then we correctly select the HTML version of the results

format: 
   md:
      prefer-html: true

Is there a way to force this for a single cell instead of the whole document?

cderv commented 1 year ago

Is there a way to force this for a single cell instead of the whole document?

Unfortunately no. It was not meant for this specific cell usage. IMO this is a side effect that it helps with this issue. It should be solved differently by correctly handling the Julia outputs for markdown format.

sethaxen commented 1 year ago

Hi, any updates on what the solution for this might be?

mcanouil commented 1 year ago

As you can see, no new informations. When there is, the team will communicate.

cscheid commented 1 year ago

Unfortunately, I don't think this is a quarto bug.

We can attempt to heuristically choose a different mimetype in case the execution doesn't include a markdown output, but that won't work well in general. If the output produced by Jupyter doesn't include markdown output, then there's a limit to how well we can handle it.

I know very little about Julia, but in Python programmers can force Markdown output. You might want to look into how Julia supports the equivalent of:

from IPython.display import Markdown
Markdown("**this will be bold**")
cderv commented 1 year ago

Thanks @cscheid for confirming.

Based on your question, I did look as this a bit differently. I found this issue in DataFrame.jl

So basically, having Markdown display is based on other tools. You have

Rough examples that would need to be improved in context of Jupyter Notebook

---
title: "DataFrames rendering test"
---

```{julia}
#| output: false
#| echo: false
using Pkg
Pkg.activate(".")
Pkg.instantiate()

Using show()

using DataFrames
using PrettyTables
df = DataFrame(:x => randn(10), :y => randn(10))
show(df, tf = PrettyTables.tf_markdown)

Using PrettyTables directly

using PrettyTables
pretty_table(df, tf = tf_markdown)

Using MarkdownTables

using MarkdownTables
df |> markdown_table(String) |> print

<details>
<summary>Markdown Outputs </summary>

````markdown
# DataFrames rendering test

## Using show()

``` julia
using DataFrames
using PrettyTables
df = DataFrame(:x => randn(10), :y => randn(10))
show(df, tf = PrettyTables.tf_markdown)
10×2 DataFrame
 Row | x           y          
     | Float64     Float64    
-----|------------------------
   1 | -0.433596   -0.91904
   2 |  1.91175     0.514662
   3 | -0.17271    -0.14763
   4 |  0.686026   -0.480422
   5 |  1.4416      0.136346
   6 |  0.807431   -0.101564
   7 |  1.72824    -0.0431558
   8 | -0.0503871   0.758866
   9 |  1.05733    -0.356043
  10 | -1.14679     0.926031

Using PrettyTables directly

using PrettyTables
pretty_table(df, tf = tf_markdown)
|          x |          y |
|    Float64 |    Float64 |
|------------|------------|
|  -0.433596 |   -0.91904 |
|    1.91175 |   0.514662 |
|   -0.17271 |   -0.14763 |
|   0.686026 |  -0.480422 |
|     1.4416 |   0.136346 |
|   0.807431 |  -0.101564 |
|    1.72824 | -0.0431558 |
| -0.0503871 |   0.758866 |
|    1.05733 |  -0.356043 |
|   -1.14679 |   0.926031 |

Using MarkdownTables

using MarkdownTables
df |> markdown_table(String) |> print
| x                    | y                    |
|----------------------|----------------------|
| -0.43359602148995735 |  -0.9190395841538315 |
|    1.911749853834694 |   0.5146618874800643 |
| -0.17271029762845877 |  -0.1476302354292226 |
|   0.6860264124167541 |  -0.4804224361576191 |
|   1.4415970172652266 |  0.13634564934712143 |
|   0.8074305434190046 | -0.10156367162958412 |
|   1.7282435099556877 | -0.04315577960397021 |
| -0.05038712142014474 |   0.7588661206604351 |
|    1.057334138204181 | -0.35604334033646007 |
|    -1.14679140689634 |   0.9260312445228053 |

It would probably be interesting to re-discuss with DataFrame.jl team so that text/markdown output is added to their default output (in addition to html and latex) : https://github.com/JuliaData/DataFrames.jl/blob/main/src/dataframerow/show.jl#L41

We can attempt to heuristically choose a different mimetype in case the execution doesn't include a markdown output, but that won't work well in general. If the output produced by Jupyter doesn't include markdown output, then there's a limit to how well we can handle it.

@cscheid about this, I understand the general thought and I agree about the markdown formatting required. However, I think something is odd in the way we are auto selecting the output. After looking at that in details at https://github.com/quarto-dev/quarto-cli/issues/6134#issuecomment-1625615902 what happens here is

I am not sure to understand why we would keep the text/latex output type here. It feels wrong to me, but maybe I am missing something. So I prefer re-explaining before we close this.

cscheid commented 1 year ago

I think you're right that we need to look at that code more carefully. Let's discuss this directly.

cderv commented 1 year ago

Just for reference to that the two are linked. Similar issue about Julia output in Jupyter