Closed saikoneru1997 closed 1 month ago
Normally we can read json file as usual
file_path = "/path/to/your/file.json" df = spark.read.json(file_path)
once reading is done we can use '.' operator we get required columns from nested json
sample json { "id": "123", "name": "Sample", "details": { "age": 30, "address": { "street": "123 Main St", "city": "Sample City" } } }
Now we can read the columns
from pyspark.sql.functions import col
df_selected = df.select("id", "name", col("details.age"), col("details.address.street"), col("details.address.city")) df_selected.show()
Normally we can read json file as usual
Load JSON file into a DataFrame
file_path = "/path/to/your/file.json" df = spark.read.json(file_path)
once reading is done we can use '.' operator we get required columns from nested json
sample json { "id": "123", "name": "Sample", "details": { "age": 30, "address": { "street": "123 Main St", "city": "Sample City" } } }
Now we can read the columns
from pyspark.sql.functions import col
Accessing nested fields using dot notation
df_selected = df.select("id", "name", col("details.age"), col("details.address.street"), col("details.address.city")) df_selected.show()