spyder-ide / spyder

Official repository for Spyder - The Scientific Python Development Environment
https://www.spyder-ide.org
MIT License
8.29k stars 1.61k forks source link

Pyspark dataframe support #2867

Open JackyP opened 8 years ago

JackyP commented 8 years ago

The variable explorer and dataframe editor is quite handy for Pandas dataframes - just being able to see how the structure looks helps guide the coding immensely.

With Apache Spark becoming increasingly popular for big data, is it possible to adapt it to be able to view pyspark dataframes and even SQLcontext (listing all the tables) also?

ccordoba12 commented 8 years ago

I haven't used Pyspark DataFrames, but after a quick look this doesn't seem an easy task, unless we could convert them to Pandas DataFrames.

Do you know if there's a way to do that?

JackyP commented 8 years ago

toPandas() converts it to a Pandas DataFrame, although it might work out to be a bit big. https://spark.apache.org/docs/1.5.2/api/python/pyspark.sql.html

ccordoba12 commented 8 years ago

Thanks @JackyP. We'll investigate about this possibility in the future.

ccordoba12 commented 8 years ago

A recommendation: if you have enough RAM, you can use toPandas to inspect PySpark DataFrames right now. We can handle Pandas DataFrames of any size :-)