Normally we can read json file as usual

Load JSON file into a DataFrame

file_path = "/path/to/your/file.json" df = spark.read.json(file_path)

once reading is done we can use '.' operator we get required columns from nested json

sample json { "id": "123", "name": "Sample", "details": { "age": 30, "address": { "street": "123 Main St", "city": "Sample City" } } }

Now we can read the columns

from pyspark.sql.functions import col

Accessing nested fields using dot notation

df_selected = df.select("id", "name", col("details.age"), col("details.address.street"), col("details.address.city")) df_selected.show()