saikoneru1997 / Azure_DataFactory

0 stars 0 forks source link

Explode_Function #19

Closed saikoneru1997 closed 3 weeks ago

saikoneru1997 commented 3 weeks ago

Scenario -

Sample data with an array column

data = [ ("John", ["Reading", "Traveling", "Music"]), ("Jane", ["Cooking", "Movies"]), ("Robert", ["Sports", "Photography", "Hiking"]) ]

Define schema

columns = ["name", "hobbies"]

Create a DataFrame

df = spark.createDataFrame(data, columns)

Show the DataFrame before exploding

df.show(truncate=False)

+-------+-------------------------------+

| name | hobbies |

+-------+-------------------------------+

| John | [Reading, Traveling, Music] |

| Jane | [Cooking, Movies] |

| Robert| [Sports, Photography, Hiking] |

+-------+-------------------------------+

Use explode() to explode the array

from pyspark.sql.functions import explode

df_exploded = df.withColumn("hobby", explode(df.hobbies))

Show the exploded DataFrame

df_exploded.show(truncate=False)

+-------+-------------------------------+-----------+

| name | hobbies | hobby |

+-------+-------------------------------+-----------+

| John | [Reading, Traveling, Music] | Reading |

| John | [Reading, Traveling, Music] | Traveling |

| John | [Reading, Traveling, Music] | Music |

| Jane | [Cooking, Movies] | Cooking |

| Jane | [Cooking, Movies] | Movies |

| Robert| [Sports, Photography, Hiking] | Sports |

| Robert| [Sports, Photography, Hiking] | Photography|

| Robert| [Sports, Photography, Hiking] | Hiking |

+-------+-------------------------------+-----------+