wallabyway / acc-bim360-parquet-reports

This repo provides an AWS Lambda function that converts CSV files from Autodesk Construction Cloud (ACC) or BIM360 to Parquet format and stores them in your S3 bucket. The Parquet files can be used for efficient reporting in AWS QuickSight or other tools. This serverless solution ensures scalability and easy integration
MIT License
2 stars 0 forks source link

BIM360/ACC Data to AWS QuickSight with Lambda

This repository provides a 'converter' lambda function. It converts CSV data from Autodesk ACC (or BIM360) into Parquet format and stores it in your own S3 bucket. This allows you to efficiently generate reports in AWS QuickSight from your data stored in Parquet format.

There are 2 other helper functions that kick off the ACC Data Connector process, wait for the CSV to become ready and then trigger the 'converter' lambda function.

Overview

Problem

Users of Autodesk Construction Cloud (ACC) or BIM360 who want to leverage the reporting power of AWS QuickSight need their data in a format suitable for QuickSight consumption. Parquet files provide an efficient, compressed format ideal for this purpose. This Lambda function automates the process of converting CSV files to Parquet, ensuring that your data is ready for QuickSight.

Solution

This project provides an AWS Lambda function that:

  1. As input: Provide a "signed CSV file URL", and a "filename" for the destination parquet file.
  2. Converts the CSV file to Parquet format using DuckDB.
  3. Uploads the Parquet file to a specified S3 bucket ( specified in the AWS Lambda configurations ).

Use the 'create-a-weekly-schedule.py' script to set up a new scheduled job, and configure the 'handler-callback.py' to listen for the 'CSV's are ready' callback event.

See Reference documentation, to see how 'handler-callback.py' retrieves the individual signURL for a CSV file from BIM360 Data-Connector API

What is Serverless?

Serverless computing is a cloud computing execution model in which the cloud provider runs the server, dynamically managing the allocation of machine resources. In this model, you can build and run applications without managing infrastructure, scaling automatically based on demand. AWS Lambda is one such service that allows you to execute code without provisioning or managing servers.


Architecture

Here is a high-level overview of how the solution works:

mermaid-diagram-2024-09-19-142944

Items:


Getting Started

Prerequisites

1. Create the Lambda handler

  1. Create a new Lambda function in your favorite region.
  2. Copy/Paste the handler.py code and replace the default handler
  3. Make the lambda function publicly (URL publicly accessible).
  4. Upload your 'duckDB.zip' file into a new 'duckDB' layer (see step 2 below)
  5. Attach the 'duckDB' layer to this handler
  6. Setup your environment variables under the configuration section (see step 3 below)

2. Create a "DuckDB" Lambda Layer

You need to create a Lambda Layer that includes DuckDB as a dependency.

A. Create a directory for the dependencies, Install DuckDB into the directory, Package the layer:

mkdir duckdb_layer
cd duckdb_layer

mkdir -p python
pip install --target ./python duckdb

zip -r9 duckdb_layer.zip python

3. Set Up Environment Variables in Lambda

Go to the AWS Lambda console and configure the following environment variables:

4. Test it

C. Example Usage

aws lambda invoke \
    --function-name your-lambda-function-name \
    --payload '{"source_url": "https://signed-url-to-csv-file", "destination_filename": "output.parquet"}' \
    response.json

How It Works

Triggering the Job

mermaid-diagram-2024-09-26-144740

The script create-a-weekly-schedule.py

PURPOSE: Call this URL endpoint, to schedule a Data Connector API dump of CSV files on a once off basis.

Remember to configure the callback to point to the 'handler-callback.py'

The script handler-callback.py

PURPOSE:

  1. Listen for "CSV files are Ready" callback event.
  2. Get List of CSVs, then trigger the 'converter' for each of them.

INPUTS:

`handler-callback.py'

See above

Conclusion

This setup helps you automatically convert Autodesk ACC (or BIM360) data into a format suitable for AWS QuickSight reporting. By leveraging serverless infrastructure with AWS Lambda, this solution is both scalable and cost-effective.