mmcdermott / MEDS_transforms

A simple set of MEDS polars-based ETL and transformation functions
MIT License
15 stars 3 forks source link

Make it such that `external_splits` specification can point to a `patient_splits.parquet` file or a prior `splits.json` file from MEDS-extract to match the cohort. #130

Open mmcdermott opened 1 month ago

mmcdermott commented 1 month ago

Right now, if you point external_splits to a prior dataset's splits.json file, it will treat the shard name as part of the split. This should be fixed such that you can point to a single "splits" file and have it reload the right splits, not the shards part.

Tagging @prenc for tracking

My current thoughts as to what should change about this: