This development milestone focuses on data pipeline module for crossing platform between CDM and analytic instance include developing ETL tools for transferring data between databases, performing basic data validation prior to the transfer, implementing ATLAS DB (WebAPI) backup and data reload processes, and ensuring cross-platform compatibility.
Story
[x] Develop data integration tools (ETL) to seamlessly transfer data between databases (CDM to ATLAS).
[x] Setup containerized of Apache Spark as bulk ETL backbone.
[x] Ensure efficient extraction, transformation, and loading of data.
[x] Implement data mapping and transformation rules for accurate integration.
[ ] Incorporate robust data validation procedures prior to data transfer.
[ ] Apply CDM transformation correctness checks to identify and rectify errors.
[ ] Enhance data integrity by enforcing validation rules on pseudonymization (OHDSI CureID).
[x] Implement a comprehensive data backup and restoration system.
[x] Create temporary DB instance for backups feature of the ATLAS database.
[x] Enable seamless data recovery and reloading to prevent case of data loss when working on Prod and Dev environment.
[x] Work on container compatibility.
[x] Ensure seamless operation across CDM, ATLAS and Temporary for backup.
[x] Optimize usability and performance include environments variable management.
Epic
This development milestone focuses on data pipeline module for crossing platform between CDM and analytic instance include developing ETL tools for transferring data between databases, performing basic data validation prior to the transfer, implementing ATLAS DB (WebAPI) backup and data reload processes, and ensuring cross-platform compatibility.
Story
Review
This epic is on the way.
Feature
Fix
Postponed / Changed
Issue Raised (Won't Fix)
Critical
Optional