Coding Questions: Pyspark:
You've been given some CSV files like karnataka.csv and maharashtra.csv in an ADLS location, each containing columns for first_name, last_name, age, sex, and location.
Your task is to add a new column called state to each DataFrame. The state column should contain the state name extracted from the filename.
For example:
For karnataka.csv, the state column should contain the value 'karnataka'.
For maharashtra.csv, the state column should contain the value 'maharashtra'.
Your solution should utilize PySpark to efficiently handle large-scale data processing tasks.
from pyspark.sql.functions import input_file_name, regexp_extract, lit df = spark.read.option("header", "true").csv('/FileStore/tables/Order_iFxK77vh3a.csv') state_name = regexp_extract(input_file_name(), r'([^/]+).csv$', 1) df2=df.withColumn('state',state_name)