Closed saikoneru1997 closed 3 weeks ago
Scenario -
data = [ ("John", ["Reading", "Traveling", "Music"]), ("Jane", ["Cooking", "Movies"]), ("Robert", ["Sports", "Photography", "Hiking"]) ]
columns = ["name", "hobbies"]
df = spark.createDataFrame(data, columns)
df.show(truncate=False)
from pyspark.sql.functions import explode
df_exploded = df.withColumn("hobby", explode(df.hobbies))
df_exploded.show(truncate=False)
Scenario -
Sample data with an array column
data = [ ("John", ["Reading", "Traveling", "Music"]), ("Jane", ["Cooking", "Movies"]), ("Robert", ["Sports", "Photography", "Hiking"]) ]
Define schema
columns = ["name", "hobbies"]
Create a DataFrame
df = spark.createDataFrame(data, columns)
Show the DataFrame before exploding
df.show(truncate=False)
+-------+-------------------------------+
| name | hobbies |
+-------+-------------------------------+
| John | [Reading, Traveling, Music] |
| Jane | [Cooking, Movies] |
| Robert| [Sports, Photography, Hiking] |
+-------+-------------------------------+
Use explode() to explode the array
from pyspark.sql.functions import explode
df_exploded = df.withColumn("hobby", explode(df.hobbies))
Show the exploded DataFrame
df_exploded.show(truncate=False)
+-------+-------------------------------+-----------+
| name | hobbies | hobby |
+-------+-------------------------------+-----------+
| John | [Reading, Traveling, Music] | Reading |
| John | [Reading, Traveling, Music] | Traveling |
| John | [Reading, Traveling, Music] | Music |
| Jane | [Cooking, Movies] | Cooking |
| Jane | [Cooking, Movies] | Movies |
| Robert| [Sports, Photography, Hiking] | Sports |
| Robert| [Sports, Photography, Hiking] | Photography|
| Robert| [Sports, Photography, Hiking] | Hiking |
+-------+-------------------------------+-----------+