Requirements:
Input data format: CSV
Transformation logic: column mapping, data type inference, encoding issues
Output data format: Parquet
Transform into star schema format
Error handling: implement mechanisms to handle errors such as data validation failures
Logging and monitoring: capture relevant information for auditing and troubleshooting purposes
TXT file upload: define the process for uploading a TXT file to trigger the LOAD lambda functioni
Lambda trigger configuration: Specify the settings required to trigger the detection of a new TXT file
Technical:
Conversion tool: Pandas library or PyArrow
Compression: choose an appropriate compression algorithm to optimise storage
Infrastructure setup:
AWS S3 to store converted parquet files
AWS Lambda
AWS IAM roles
AWS Cloudwatch for logging and monitoring
Testing:
Unit testing: pytest
Integration testing: mock testing
Requirements: Input data format: CSV Transformation logic: column mapping, data type inference, encoding issues Output data format: Parquet Transform into star schema format Error handling: implement mechanisms to handle errors such as data validation failures Logging and monitoring: capture relevant information for auditing and troubleshooting purposes TXT file upload: define the process for uploading a TXT file to trigger the LOAD lambda functioni Lambda trigger configuration: Specify the settings required to trigger the detection of a new TXT file
Technical: Conversion tool: Pandas library or PyArrow Compression: choose an appropriate compression algorithm to optimise storage
Infrastructure setup: AWS S3 to store converted parquet files AWS Lambda AWS IAM roles AWS Cloudwatch for logging and monitoring
Testing: Unit testing: pytest Integration testing: mock testing