Closed sid-kap closed 3 years ago
I was wondering the same thing, and I've found two ways to display Pandas dfs as formatted tables in a sparkmagic notebook, although I don't like either one of them and wish there was a better way. In a sparkmagic notebook, the classic IPython.display.display()
func does not display Pandas dfs as formatted HTML tables.
You can store Spark dfs as temporary views, then access the views using the %%sql
magic command. The magic will automatically display the resulting df as a formatted table. The issue I found with this method is that I can't figure out how to format the display options for the table, so columns are truncated.
eg.
%%spark
input_df = spark.read.format('csv').load(filename)
input_df.createOrReplaceTempView('input_view')
%%sql
select * from input_view
If you use a %%local
cell, Jupyter will display Pandas dfs as formatted tables. The issue here is that your Spark code to pull the data and create the Pandas df must be run in %%local
, which defeats the purpose of using Spark.
If the pandas dataframe is collected on the EMR driver, how can I make it available on local for pretty print?
Is your feature request related to a problem? Please describe. In JupyterHub using Sparkmagic, the Pandas tables show up in plain text rather than as HTML tables. Is there a way to make tables show up like they would in a regular Jupyter kernel?
Describe the solution you'd like Make tables show up with better formatting
Describe alternatives you've considered Maybe local mode might fix this?
Additional context