tusharchou / local-data-platform

python library for iceberg lake house on your local
MIT License
7 stars 5 forks source link

Excel to csv support #48

Open anushreegupta0043 opened 2 days ago

anushreegupta0043 commented 2 days ago

Hi team

Thanks for fixing the last issue raised #31!

I am a beginner with Python and working with restaurant data. The data is extracted as an excel file from a website and needs to be transformed to csv. The file then needs to be saved at a defined location, here a Google drive folder.

Can you guide me if this possible to be done using local-data-platform?

Thanks!

redpheonixx commented 2 days ago

Hi @anushreegupta0043 , currently above functionality is not a part of local-data-platform but below points might help

  1. you can use pyarrow to read the excel and write it in csv

    import pyarrow.csv as pcsv
    import pyarrow.excel as pxl
    def excelToCsv(excel_file, csv_file):
    workbook = pxl.read_worbook(excel_file)
    sheet=workbook.sheets[0]
    table=sheet.to_table()
    pcsv.write_csv(table, csv_file)
  2. please note this will create a csv file in your local machine which you need to upload to the gdrive. To do this you can use googleapiclient and google oauth for authentication. for credentials you need to go to google cloud console and download credentials.json. Include credentials to your .env file

    
    from google.oauth2.service_account import Credentials
    from googleapiclient.discovery import build
    from googleapiclient.http import MediaFileUpload 
    from dotenv import load_dotenv

load_dotenv()

def gdriveSetup(credentials_file): SCOPES = ['https://www.googleapis.com/auth/drive.file'] creds = Credentials.from_service_account_file(credentials_file, scopes=SCOPES) return build('drive', 'v3', credentials=creds)

def uploadCsvFile(csv_file, service): file_metadata={'name': csv_file, 'mimeType': 'text/csv'} media = MediaFileUpload(csv_file, mimeType='text/csv') file = service.files().create(body=file_metadata, media_body=media, fields='id').execute() file_id = file.get('id') os.remove(csv_file) return file_id



  Hope this helps

@tusharchou better we take a note of it to include conversion functions as part of transformation layer.