jupyter / nbconvert

Jupyter Notebook Conversion
https://nbconvert.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1.74k stars 568 forks source link

nbconvert doesn't respect pandas style #1395

Closed ghuname closed 4 years ago

ghuname commented 4 years ago

I have a simple dataframe that I would like to color dynamically by using df.style.

I generated a dataframe with:

import matplotlib
import pandas as pd
import string
%matplotlib inline

letters = string.ascii_uppercase #+ string.ascii_lowercase 
letters = letters[:10]

df = pd.DataFrame({'a': range(1,len(letters)+1), 'b':list(letters)})
df

image

Now I would like to use colors.

import seaborn as sns
cm = sns.light_palette("green", as_cmap=True)

df.style.background_gradient(cmap=cm).hide_index().highlight_max(color='blue').highlight_min()

image

If I render last line with: df.style.background_gradient(cmap=cm).hide_index().highlight_max(color='blue').highlight_min().render() I get:

'<style type="text/css" >\n#T_e7a9a29a_f734_11ea_92c3_0242ac110002row0_col0{\n background-color: #e5ffe5;\n color: #000000;\n background-color: yellow;\n }#T_e7a9a29a_f734_11ea_92c3_0242ac110002row1_col0{\n background-color: #ccf1cc;\n color: #000000;\n }#T_e7a9a29a_f734_11ea_92c3_0242ac110002row2_col0{\n background-color: #b3e3b3;\n color: #000000;\n }#T_e7a9a29a_f734_11ea_92c3_0242ac110002row3_col0{\n background-color: #99d599;\n color: #000000;\n }#T_e7a9a29a_f734_11ea_92c3_0242ac110002row4_col0{\n background-color: #80c780;\n color: #000000;\n }#T_e7a9a29a_f734_11ea_92c3_0242ac110002row5_col0{\n background-color: #66b866;\n color: #000000;\n }#T_e7a9a29a_f734_11ea_92c3_0242ac110002row6_col0{\n background-color: #4daa4d;\n color: #000000;\n }#T_e7a9a29a_f734_11ea_92c3_0242ac110002row7_col0{\n background-color: #329c32;\n color: #000000;\n }#T_e7a9a29a_f734_11ea_92c3_0242ac110002row8_col0{\n background-color: #198e19;\n color: #000000;\n }#T_e7a9a29a_f734_11ea_92c3_0242ac110002row9_col0{\n background-color: #008000;\n color: #f1f1f1;\n background-color: blue;\n }</style><table id="T_e7a9a29a_f734_11ea_92c3_0242ac110002" ><thead> <tr> <th class="col_heading level0 col0" >a</th> <th class="col_heading level0 col1" >b</th> </tr></thead><tbody>\n <tr>\n <td id="T_e7a9a29a_f734_11ea_92c3_0242ac110002row0_col0" class="data row0 col0" >1</td>\n <td id="T_e7a9a29a_f734_11ea_92c3_0242ac110002row0_col1" class="data row0 col1" >A</td>\n </tr>\n <tr>\n <td id="T_e7a9a29a_f734_11ea_92c3_0242ac110002row1_col0" class="data row1 col0" >2</td>\n <td id="T_e7a9a29a_f734_11ea_92c3_0242ac110002row1_col1" class="data row1 col1" >B</td>\n </tr>\n <tr>\n <td id="T_e7a9a29a_f734_11ea_92c3_0242ac110002row2_col0" class="data row2 col0" >3</td>\n <td id="T_e7a9a29a_f734_11ea_92c3_0242ac110002row2_col1" class="data row2 col1" >C</td>\n </tr>\n <tr>\n <td id="T_e7a9a29a_f734_11ea_92c3_0242ac110002row3_col0" class="data row3 col0" >4</td>\n <td id="T_e7a9a29a_f734_11ea_92c3_0242ac110002row3_col1" class="data row3 col1" >D</td>\n </tr>\n <tr>\n <td id="T_e7a9a29a_f734_11ea_92c3_0242ac110002row4_col0" class="data row4 col0" >5</td>\n <td id="T_e7a9a29a_f734_11ea_92c3_0242ac110002row4_col1" class="data row4 col1" >E</td>\n </tr>\n <tr>\n <td id="T_e7a9a29a_f734_11ea_92c3_0242ac110002row5_col0" class="data row5 col0" >6</td>\n <td id="T_e7a9a29a_f734_11ea_92c3_0242ac110002row5_col1" class="data row5 col1" >F</td>\n </tr>\n <tr>\n <td id="T_e7a9a29a_f734_11ea_92c3_0242ac110002row6_col0" class="data row6 col0" >7</td>\n <td id="T_e7a9a29a_f734_11ea_92c3_0242ac110002row6_col1" class="data row6 col1" >G</td>\n </tr>\n <tr>\n <td id="T_e7a9a29a_f734_11ea_92c3_0242ac110002row7_col0" class="data row7 col0" >8</td>\n <td id="T_e7a9a29a_f734_11ea_92c3_0242ac110002row7_col1" class="data row7 col1" >H</td>\n </tr>\n <tr>\n <td id="T_e7a9a29a_f734_11ea_92c3_0242ac110002row8_col0" class="data row8 col0" >9</td>\n <td id="T_e7a9a29a_f734_11ea_92c3_0242ac110002row8_col1" class="data row8 col1" >I</td>\n </tr>\n <tr>\n <td id="T_e7a9a29a_f734_11ea_92c3_0242ac110002row9_col0" class="data row9 col0" >10</td>\n <td id="T_e7a9a29a_f734_11ea_92c3_0242ac110002row9_col1" class="data row9 col1" >J</td>\n </tr>\n </tbody></table>'

Is this bug or feature?

Here is the notebook:

{
 "cells": [
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "<link rel=\"stylesheet\" href=\"https://cdn.jupyter.org/notebook/5.1.0/style/style.min.css\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib\n",
    "import pandas as pd\n",
    "import string\n",
    "%matplotlib inline\n",
    "# pd.set_option(\"display.notebook_repr_html\", False)\n",
    "pd.set_option(\"display.latex.repr\", True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "letters = string.ascii_uppercase #+ string.ascii_lowercase \n",
    "letters = letters[:10]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>D</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>E</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>6</td>\n",
       "      <td>F</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>7</td>\n",
       "      <td>G</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>8</td>\n",
       "      <td>H</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>9</td>\n",
       "      <td>I</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>10</td>\n",
       "      <td>J</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/latex": [
       "\\begin{tabular}{lrl}\n",
       "\\toprule\n",
       "{} &   a &  b \\\\\n",
       "\\midrule\n",
       "0 &   1 &  A \\\\\n",
       "1 &   2 &  B \\\\\n",
       "2 &   3 &  C \\\\\n",
       "3 &   4 &  D \\\\\n",
       "4 &   5 &  E \\\\\n",
       "5 &   6 &  F \\\\\n",
       "6 &   7 &  G \\\\\n",
       "7 &   8 &  H \\\\\n",
       "8 &   9 &  I \\\\\n",
       "9 &  10 &  J \\\\\n",
       "\\bottomrule\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "    a  b\n",
       "0   1  A\n",
       "1   2  B\n",
       "2   3  C\n",
       "3   4  D\n",
       "4   5  E\n",
       "5   6  F\n",
       "6   7  G\n",
       "7   8  H\n",
       "8   9  I\n",
       "9  10  J"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame({'a': range(1,len(letters)+1), 'b':list(letters)})\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Collored table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style  type=\"text/css\" >\n",
       "#T_1fc4aa8c_f738_11ea_b110_0242ac110002row0_col0{\n",
       "            background-color:  #e5ffe5;\n",
       "            color:  #000000;\n",
       "            background-color:  yellow;\n",
       "        }#T_1fc4aa8c_f738_11ea_b110_0242ac110002row1_col0{\n",
       "            background-color:  #ccf1cc;\n",
       "            color:  #000000;\n",
       "        }#T_1fc4aa8c_f738_11ea_b110_0242ac110002row2_col0{\n",
       "            background-color:  #b3e3b3;\n",
       "            color:  #000000;\n",
       "        }#T_1fc4aa8c_f738_11ea_b110_0242ac110002row3_col0{\n",
       "            background-color:  #99d599;\n",
       "            color:  #000000;\n",
       "        }#T_1fc4aa8c_f738_11ea_b110_0242ac110002row4_col0{\n",
       "            background-color:  #80c780;\n",
       "            color:  #000000;\n",
       "        }#T_1fc4aa8c_f738_11ea_b110_0242ac110002row5_col0{\n",
       "            background-color:  #66b866;\n",
       "            color:  #000000;\n",
       "        }#T_1fc4aa8c_f738_11ea_b110_0242ac110002row6_col0{\n",
       "            background-color:  #4daa4d;\n",
       "            color:  #000000;\n",
       "        }#T_1fc4aa8c_f738_11ea_b110_0242ac110002row7_col0{\n",
       "            background-color:  #329c32;\n",
       "            color:  #000000;\n",
       "        }#T_1fc4aa8c_f738_11ea_b110_0242ac110002row8_col0{\n",
       "            background-color:  #198e19;\n",
       "            color:  #000000;\n",
       "        }#T_1fc4aa8c_f738_11ea_b110_0242ac110002row9_col0{\n",
       "            background-color:  #008000;\n",
       "            color:  #f1f1f1;\n",
       "            background-color:  blue;\n",
       "        }</style><table id=\"T_1fc4aa8c_f738_11ea_b110_0242ac110002\" ><thead>    <tr>        <th class=\"col_heading level0 col0\" >a</th>        <th class=\"col_heading level0 col1\" >b</th>    </tr></thead><tbody>\n",
       "                <tr>\n",
       "                                <td id=\"T_1fc4aa8c_f738_11ea_b110_0242ac110002row0_col0\" class=\"data row0 col0\" >1</td>\n",
       "                        <td id=\"T_1fc4aa8c_f738_11ea_b110_0242ac110002row0_col1\" class=\"data row0 col1\" >A</td>\n",
       "            </tr>\n",
       "            <tr>\n",
       "                                <td id=\"T_1fc4aa8c_f738_11ea_b110_0242ac110002row1_col0\" class=\"data row1 col0\" >2</td>\n",
       "                        <td id=\"T_1fc4aa8c_f738_11ea_b110_0242ac110002row1_col1\" class=\"data row1 col1\" >B</td>\n",
       "            </tr>\n",
       "            <tr>\n",
       "                                <td id=\"T_1fc4aa8c_f738_11ea_b110_0242ac110002row2_col0\" class=\"data row2 col0\" >3</td>\n",
       "                        <td id=\"T_1fc4aa8c_f738_11ea_b110_0242ac110002row2_col1\" class=\"data row2 col1\" >C</td>\n",
       "            </tr>\n",
       "            <tr>\n",
       "                                <td id=\"T_1fc4aa8c_f738_11ea_b110_0242ac110002row3_col0\" class=\"data row3 col0\" >4</td>\n",
       "                        <td id=\"T_1fc4aa8c_f738_11ea_b110_0242ac110002row3_col1\" class=\"data row3 col1\" >D</td>\n",
       "            </tr>\n",
       "            <tr>\n",
       "                                <td id=\"T_1fc4aa8c_f738_11ea_b110_0242ac110002row4_col0\" class=\"data row4 col0\" >5</td>\n",
       "                        <td id=\"T_1fc4aa8c_f738_11ea_b110_0242ac110002row4_col1\" class=\"data row4 col1\" >E</td>\n",
       "            </tr>\n",
       "            <tr>\n",
       "                                <td id=\"T_1fc4aa8c_f738_11ea_b110_0242ac110002row5_col0\" class=\"data row5 col0\" >6</td>\n",
       "                        <td id=\"T_1fc4aa8c_f738_11ea_b110_0242ac110002row5_col1\" class=\"data row5 col1\" >F</td>\n",
       "            </tr>\n",
       "            <tr>\n",
       "                                <td id=\"T_1fc4aa8c_f738_11ea_b110_0242ac110002row6_col0\" class=\"data row6 col0\" >7</td>\n",
       "                        <td id=\"T_1fc4aa8c_f738_11ea_b110_0242ac110002row6_col1\" class=\"data row6 col1\" >G</td>\n",
       "            </tr>\n",
       "            <tr>\n",
       "                                <td id=\"T_1fc4aa8c_f738_11ea_b110_0242ac110002row7_col0\" class=\"data row7 col0\" >8</td>\n",
       "                        <td id=\"T_1fc4aa8c_f738_11ea_b110_0242ac110002row7_col1\" class=\"data row7 col1\" >H</td>\n",
       "            </tr>\n",
       "            <tr>\n",
       "                                <td id=\"T_1fc4aa8c_f738_11ea_b110_0242ac110002row8_col0\" class=\"data row8 col0\" >9</td>\n",
       "                        <td id=\"T_1fc4aa8c_f738_11ea_b110_0242ac110002row8_col1\" class=\"data row8 col1\" >I</td>\n",
       "            </tr>\n",
       "            <tr>\n",
       "                                <td id=\"T_1fc4aa8c_f738_11ea_b110_0242ac110002row9_col0\" class=\"data row9 col0\" >10</td>\n",
       "                        <td id=\"T_1fc4aa8c_f738_11ea_b110_0242ac110002row9_col1\" class=\"data row9 col1\" >J</td>\n",
       "            </tr>\n",
       "    </tbody></table>"
      ],
      "text/plain": [
       "<pandas.io.formats.style.Styler at 0x7fd331a0a700>"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import seaborn as sns\n",
    "cm = sns.light_palette(\"green\", as_cmap=True)\n",
    "\n",
    "df.style.background_gradient(cmap=cm).hide_index().highlight_max(color='blue').highlight_min()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}

nbconvert version: 5.6.1

MSeal commented 4 years ago

So two things -- what output format are you trying to convert to (what's the command you are using with nbconvert) and have you run it with nbconvert 6.0?

What you are seeing is the html output of the dataframe as plain text instead of rendered text. If you're outputting to html this should render nicely.

ghuname commented 4 years ago

@MSeal, I am trying to export to pdf with --to pdf option. Furthermore, if I understood correctly, there is no support for panda stylers conversion latex/pdf at all. For example, df.style.set_properties(**{'background-color': 'yellow'}) produces styler object that knows how to represent itself in html format, but doesn't know how to represent itself in latex at all. There is no _repr_latex_ method on styler object.

At the moment, I am trying to subclass pandas styler object (from pandas.io.formats.style import Styler) in order to add _repr_latex_ method in which I will iterate through dataframe and generate latex code by myself. So, I will use styler object like MyStyler(df, section_caption='section', table_caption='table', table_header_bold=True...).

I don't see any other option. If someone has more experience than me, please share your ideas.

MSeal commented 4 years ago

Yes latex conversion doesn't respect HTML attributes in this path afaik. However you could try using the new webpdf conversion path to generate the PDF from a headless Chromium process. That would preserve HTML styling in export.

SylvainCorlay commented 4 years ago

I confirm that this works well with --to webpdf:

Screenshot from 2020-09-23 15-46-35

SylvainCorlay commented 4 years ago

Closing as answered!

ghuname commented 4 years ago

What I have to install on centos 7 in order to be able to use --to webpdf?

MSeal commented 4 years ago

Have you tried the webpdf installation instructions yet? I don't expect there needs to be anything more for centos but I haven't explicitly tested that OS for that capability.

ghuname commented 4 years ago

I tried to print to pdf from chrome and result wasn't good so I assume that this is not the solution.

SylvainCorlay commented 4 years ago

@ghuname just use the webpdf exporter of nbconvert, and you should get the kind of result that I included as a screenshot above.

montahaee commented 6 months ago

Actually your demo highlights another issue with the webpdf conversion. The code cell in your demo is truncated on the right side. This seems to be a significant problem as it affects the readability of the code. Could we reopen the issue to address this truncation problem?