Open mmcdermott opened 4 months ago
The updates primarily focus on ensuring consistent handling of subject IDs across the codebase by aligning data types and refining data processing. Additionally, enhancements to the caching mechanisms and shard creation process for deep learning representations have been implemented. A new script for building flat representation datasets has been added, and various test updates ensure compatibility and verify output formats.
Files | Change Summary |
---|---|
EventStream/baseline/FT_task_baseline.py | Adjusted filtering operations for subject IDs using ESD.subject_id_dtype and modified load_flat_rep function parameters. |
EventStream/data/dataset_polars.py | Updated subject ID handling, recalculated n_events_per_subject , adjusted filters, and added debug logs for data sizes. |
EventStream/data/dataset_base.py | Refined caching of parameters, modifications to deep learning shard creation, improved handling of parameters, and refined processing subject chunks. |
EventStream/data/pytorch_dataset.py | Updated method to ensure consistent data types when accessing subject IDs. |
scripts/build_flat_reps.py | Added new script for building flat representation datasets using a hydra config file, including necessary imports and function definitions. |
tests/test_e2e_runs.py | Added imports, new testing method for dataset outputs, and updated setUp and build_dataset methods to handle Parquet format outputs and perform assertions. |
tests/data/test_pytorch_dataset.py | Modified setUp method to convert subject IDs to strings when constructing the shards dictionary. |
sequenceDiagram
participant User
participant Script as build_flat_reps.py
participant Dataset as EventStream.data.dataset_polars.Dataset
participant CacheRep as cache_flat_representation
User->>Script: Execute script with config
Script->>Dataset: Load dataset
Dataset->>CacheRep: Resolve cache parameters
CacheRep->>Dataset: Cache flat representation
Dataset-->>Script: Processed Dataset
Script-->>User: Execution complete
This diagram outlines the flow of execution when building a flat representation dataset using the new script build_flat_reps.py
. It shows the interactions between the user, script, dataset, and caching process.
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?
Attention: Patch coverage is 76.92308%
with 9 lines
in your changes missing coverage. Please review.
Project coverage is 86.06%. Comparing base (
f84069c
) to head (9f3ce52
). Report is 39 commits behind head on dev.:exclamation: Current head 9f3ce52 differs from pull request most recent head 5150e05
Please upload reports for the commit 5150e05 to get more accurate results.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
STILL IN PROGRESS. We just have failing test cases for now.