saikoneru1997 / Azure_DataFactory

0 stars 0 forks source link

Get Filename into new column scenariobased-1 #17

Closed saikoneru1997 closed 3 weeks ago

saikoneru1997 commented 3 weeks ago
  1. Coding Questions: Pyspark: You've been given some CSV files like karnataka.csv and maharashtra.csv in an ADLS location, each containing columns for first_name, last_name, age, sex, and location. Your task is to add a new column called state to each DataFrame. The state column should contain the state name extracted from the filename. For example: For karnataka.csv, the state column should contain the value 'karnataka'. For maharashtra.csv, the state column should contain the value 'maharashtra'. Your solution should utilize PySpark to efficiently handle large-scale data processing tasks.

from pyspark.sql.functions import input_file_name, regexp_extract, lit df = spark.read.option("header", "true").csv('/FileStore/tables/Order_iFxK77vh3a.csv') state_name = regexp_extract(input_file_name(), r'([^/]+).csv$', 1) df2=df.withColumn('state',state_name)

Image