Closed jschalk closed 1 week ago
This PR implements a significant refactoring of the data transformation pipeline, primarily focusing on renaming components and adding new functionality for handling nub staging. The changes maintain the existing data flow while making the naming more consistent and adding support for a new brick format (br00114).
erDiagram
br00114AbstractTable {
String face_id PK
Integer event_id PK
String acct_id
String fiscal_id
String inx_label
String otx_label
String jaar_type
String owner_id
}
br00114HoldTable {
String face_id PK
Integer event_id PK
String acct_id
String fiscal_id
String inx_label
String otx_label
String jaar_type
String owner_id
}
br00114StageTable {
String face_id PK
Integer event_id PK
String acct_id
String fiscal_id
String inx_label
String otx_label
String jaar_type
String owner_id
String src_type
String src_path
String src_sheet
}
br00114AbstractTable ||--|| br00114HoldTable : "extends"
br00114AbstractTable ||--|| br00114StageTable : "extends"
classDiagram
class JungleToZooTransformer {
+transform()
+_group_jungle_data()
+_read_and_tag_dataframe(ref)
+_save_to_zoo_staging(brick_number, dfs)
}
class ZooStagingToZooAggTransformer {
+transform()
+_group_by_brick_columns(zoo_staging_df, brick_number)
+_save_zoo_agg(brick_path, zoo_staging_df)
}
class ZooAggToZooEventsTransformer {
+transform()
+get_unique_events(zoo_agg_df)
+_save_zoo_events(brick_path, events_df)
}
class ZooEventsToEventsLogTransformer {
+transform()
+get_event_log_df(otx_events_df, x_dir, x_file_name)
+_save_events_log_file(events_df)
}
class ZooAggToOtx2InxStagingTransformer {
+transform()
+insert_legitmate_zoo_agg_otx2inx_atts(otx2inx_df, brick_number, zoo_brick_path, otx2inx_columns)
}
class ZooAggToNubStagingTransformer {
+transform()
+insert_legitmate_zoo_agg_nub_atts(nub_df, brick_number, zoo_brick_path, nub_columns)
}
Change | Details | Files |
---|---|---|
Renamed data transformation pipeline components to use more consistent terminology |
|
src/f10_world/world.py src/f10_world/test/test_world_jungle_to_zoo.py |
Added new brick format br00114 for account nub label handling |
|
src/f09_brick/brick_formats/brick_format_00114_acct_nub_label_v0_0_0.json src/f09_brick/brick_models.py src/f09_brick/brick_config.py |
Implemented new nub staging transformation functionality |
|
src/f10_world/world.py src/f10_world/test/test_world_jungle_to_zoo.py |
Summary by Sourcery
Refactor the codebase to implement a new staging workflow for data transformation processes, including renaming methods, classes, and test cases to reflect the changes.
Enhancements:
Tests: