millipz / nc-de-deliverance-project

Project Repo for Deliverance Team 2024
3 stars 2 forks source link

Create spec document for transform Lambda and associated infrastructure #13

Open azmolmiah opened 6 months ago

azmolmiah commented 6 months ago

Requirements: Input data format: CSV Transformation logic: column mapping, data type inference, encoding issues Output data format: Parquet Transform into star schema format Error handling: implement mechanisms to handle errors such as data validation failures Logging and monitoring: capture relevant information for auditing and troubleshooting purposes TXT file upload: define the process for uploading a TXT file to trigger the LOAD lambda functioni Lambda trigger configuration: Specify the settings required to trigger the detection of a new TXT file

Technical: Conversion tool: Pandas library or PyArrow Compression: choose an appropriate compression algorithm to optimise storage

Infrastructure setup: AWS S3 to store converted parquet files AWS Lambda AWS IAM roles AWS Cloudwatch for logging and monitoring

Testing: Unit testing: pytest Integration testing: mock testing