jupyter-incubator / sparkmagic

Jupyter magics and kernels for working with remote Spark clusters
Other
1.33k stars 447 forks source link

Prettier HTML tables? #623

Closed sid-kap closed 3 years ago

sid-kap commented 4 years ago

Is your feature request related to a problem? Please describe. In JupyterHub using Sparkmagic, the Pandas tables show up in plain text rather than as HTML tables. Is there a way to make tables show up like they would in a regular Jupyter kernel?

Describe the solution you'd like Make tables show up with better formatting

Describe alternatives you've considered Maybe local mode might fix this?

Additional context

xpkcs commented 4 years ago

I was wondering the same thing, and I've found two ways to display Pandas dfs as formatted tables in a sparkmagic notebook, although I don't like either one of them and wish there was a better way. In a sparkmagic notebook, the classic IPython.display.display() func does not display Pandas dfs as formatted HTML tables.

Using %%sql

You can store Spark dfs as temporary views, then access the views using the %%sql magic command. The magic will automatically display the resulting df as a formatted table. The issue I found with this method is that I can't figure out how to format the display options for the table, so columns are truncated.

eg.

%%spark
input_df = spark.read.format('csv').load(filename)
input_df.createOrReplaceTempView('input_view')

%%sql
select * from input_view

Using %%local

If you use a %%local cell, Jupyter will display Pandas dfs as formatted tables. The issue here is that your Spark code to pull the data and create the Pandas df must be run in %%local, which defeats the purpose of using Spark.

pancodia commented 2 years ago

If the pandas dataframe is collected on the EMR driver, how can I make it available on local for pretty print?