jupyter-incubator / sparkmagic

Jupyter magics and kernels for working with remote Spark clusters
Other
1.32k stars 444 forks source link

horizontal scrolling of Pandas dataframes #721

Closed philmassie closed 3 years ago

philmassie commented 3 years ago

Is your feature request related to a problem? Please describe. I very often need to look at dataframes that are wider than my monitor. Spark dataframes wrap row by row, but Pandas render out pretty well, providing a nice scrollbar. As a result, until now on a normal edge node, I would do something like this:

pdf = sdf.limit(5).toPandas()
pdf

When I run this in %%spark, the pdf wraps. In local, for some reason pdf renders as 'normal' with horizontal scrollbars.

Describe the solution you'd like I would like to be able to quickly inspect table contents, wider than my monitor without the awkward step of transferring it to local first. This is possibly already solved but I can't find anything.

Describe alternatives you've considered The only way I can find is to collect the spark dataframe locally in pandas with a dedicated %%spark magic. Perhaps there is a way to make the notebooks width much wider, this might not be a SparkMagic thing at all.

Additional context I hope these illustrate what I'm on about. 1 2 3

Thank you very much in advance

devstein commented 3 years ago

Hi @philmassie thanks for making a detailed issue! This is a common issue with Spark and is not exclusive to Sparkmagic. There are two existing solutions I know of

1) Use the vertical=True flag for show. This makes it much easier to inspect a few rows of a wide table.

spark.createDataFrame(pdf).show(vertical=True)

2) Use the new Sparkmagic %%pretty magic; however, you need to be in a Spark or PySpark kernel for it to work

JianweiChen commented 3 years ago

%matplotlib inline from IPython.core.display import HTML display(HTML("")) then you can scroll horizontal, even in %%spark