Closed Oufattole closed 1 month ago
I think this line should be removed: https://github.com/mmcdermott/MEDS_transforms/blob/158_fix_typing_issue/src/MEDS_transforms/configs/stage_configs/fit_vocabulary_indices.yaml#L4
that may not be the entire problem, but I suspect it is part
I believe this line: https://github.com/mmcdermott/MEDS_transforms/blob/158_fix_typing_issue/src/MEDS_transforms/utils.py#L307 should point to "reducer_output_dir" not "output_dir"
And clearly a multi-stage, multi-metadata stage integration test is also needed, not just singleton stage testers.
Subsidiary issues:
Fixed by #167 and verified with a full, E2E preprocess pipeline integration test.
The normalization stage is failing for me because there is no
data/codes.parquet
file.When I try to copy over the metadata/codes/parquet file: cp "${MEDS_DIR}/data/metadata/codes.parquet" "${MEDS_DIR}/data/codes.parquet" I get an error that there is no
values/sum
columnAnd when I try to copy over the aggregate_code_metadata/codes.parquet: cp "${MEDS_DIR}/aggregate_code_metadata/codes.parquet" "${MEDS_DIR}/data/codes.parquet" I get an error that there is no "code/vocab_index" column.
What worked for me as a temporary solution was to spin up a simple hydra script to generate a code/vocab_index column:
This issue exists on the
dev
branch and on release 0.0.4