gregsexton / ob-ipython

org-babel integration with Jupyter for evaluation of (Python by default) code blocks
739 stars 109 forks source link

Pandoc table rendering problem #119

Closed jamieforth closed 7 years ago

jamieforth commented 7 years ago

The new pandoc-based rendering doesn't seem to handle cases where pandas DataFrames contain a named index. I guess this is an issue with pandoc rather than ob-ipython itself, but I just wondered if anyone else has encountered this problem before I dig a little deeper.

#+begin_src ipython :session :results raw
  import pandas as pd

  df = pd.DataFrame({'a': [0, 1, 2], 'b': [3, 4, 5]})
  df
#+end_src

#+results:
|   | a | b |
|---+---+---|
| 0 | 0 | 3 |
| 1 | 1 | 4 |
| 2 | 2 | 5 |

+results:

a b id 0 0 3 1 1 4 2 2 5


- DataFrame with named index rendered the old tabulate way:

+begin_src ipython :session :results output raw

import pandas as pd from tabulate import tabulate

df = pd.DataFrame({'a': [0, 1, 2], 'b': [3, 4, 5]}) df.index.name = 'id' print(tabulate(df, headers="keys", tablefmt='orgtbl'))

+end_src

+results:

| id | a | b | |----+---+---| | 0 | 0 | 3 | | 1 | 1 | 4 | | 2 | 2 | 5 |


Also the new pandoc rendering doesn't handle pd.Series objects.

- Series rendered by pandoc

+begin_src ipython :session :results raw

import pandas as pd

data = pd.Series(['a', 'b', 'c']) data

+end_src

+results:

0 a 1 b 2 c


- Series rendered by tabulate

+begin_src ipython :session :results output raw

import pandas as pd from tabulate import tabulate

data = pd.Series(['a', 'b', 'c']) print(tabulate(data, tablefmt='orgtbl'))

+end_src

+results:

| 0 | a | | 1 | b | | 2 | c |

gregsexton commented 7 years ago

Since making this change I'd encountered this a few times. I have confirmed it's pandoc not handling the html well. Thanks for figuring out it was to do with a named index. At least now I can reliably repro. Not sure what to do here, but I'll keep thinking about it and will try to fix - it's bothering me. :)

gregsexton commented 7 years ago

Pandoc wasn't cutting it for me. For now I've disabled html rendering as it can't be relied on. I added support for rendering anything that generates org text. See the tips and tricks section of the readme. I make use of this to automatically run arrays and data frames through tabulate to get org tables. Seems to be working well so far.