mmcdermott MEDS_Tabular_AutoML issues

mmcdermott / MEDS_Tabular_AutoML

Limited automatic tabular ML pipelines for generic MEDS datasets.

MIT License

10 stars 2 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Relax polars dependency to prevent conflicts with other (MEDS) packages

#94 rvandewater opened 2 weeks ago
0
model_file_stem not resolving for sklearn models

#93 teyaberg opened 3 weeks ago
0
Mimiciv

#92 Oufattole closed 3 weeks ago
2
Re-worked the tabular dataset config a bit.

#91 mmcdermott closed 3 weeks ago
2
Simplify model launcher configs and add script input checks

#90 Oufattole closed 3 weeks ago
2
Error to general: Error occurred: No data found in the shards or labels. Please check input files.

#89 rvandewater closed 2 weeks ago
0
Clear input/output documentation can help with debugging

#88 rvandewater opened 1 month ago
0
We should distribute python typing information

#87 mmcdermott opened 1 month ago
1
We should make our config names more package specific

#86 mmcdermott opened 1 month ago
1
If `describe_codes` sees no input files, it should throw an error.

#85 mmcdermott opened 1 month ago
1
Move tabularization to YAML configuration

#84 coderabbitai[bot] opened 1 month ago
1
Log file path for tabularization may not be specific enough.

#83 mmcdermott opened 1 month ago
0
Log file path is not being interpolated correctly.

#82 mmcdermott opened 1 month ago
0
added autogluon support, more models, more preprocessing strategies

#81 Oufattole closed 3 weeks ago
3
Make compatible with MEDS v0.3.2

#80 mmcdermott opened 1 month ago
0
added workflows for submitting every main branch commit to the test-p…

#79 Oufattole closed 1 month ago
2
Dev

#78 Oufattole closed 1 month ago
1
Error messaging and handling is poor for when no `"static/first"` codes exist in the dataset

#77 mmcdermott opened 1 month ago
2
Why are we restricting just to the first static code per patient? Shouldn't we want to summarize all static codes?

#76 mmcdermott opened 1 month ago
3
This package should integrate more closely with MEDS-Transform

#75 mmcdermott opened 1 month ago
0
The MEDS v0.3 configuration interface being pursued in #55 is poor

#74 mmcdermott opened 1 month ago
0
This is not a good default for this parameter. It should default to ???

#73 mmcdermott opened 1 month ago
0
DO NOT MERGE. Temporary.

#72 mmcdermott closed 1 month ago
1
Fixes label schema issues. Also gets ready for the re-shard stage to be part of the stack, but doesn't fully solve that.

#71 mmcdermott closed 1 month ago
1
The two test files existent as of when this issue is filed (link below) seem at least partially redundant with one another.

#70 mmcdermott opened 1 month ago
0
Tests are too sparse and too dependent on integration tests.

#69 mmcdermott opened 1 month ago
0
The need to re-shard labels as well should be removed if possible.

#68 mmcdermott opened 1 month ago
0
The dependence on the MEDS_polars sharding style `split / shard_num` is brittle and should be removed where possible.

#67 mmcdermott opened 1 month ago
0
The way task dataframes are read in needs to be compliant with MEDS v0.3 datasets.

#66 mmcdermott closed 1 month ago
0
Running changes for full MEDS v0.3 Compliance

#65 mmcdermott closed 1 month ago
4
Update dev to re-sync to main.

#64 mmcdermott closed 1 month ago
1
Added version dependency.

#63 mmcdermott closed 1 month ago
1
Updates paths to be compliant with MEDS v0.3.

#62 mmcdermott closed 1 month ago
2
XGboost config is brittle and has unused params; testing is also brittle.

#61 mmcdermott closed 1 month ago
0
If the xgboost script crashes when run on the command line, it should return a non-zero error code

#60 mmcdermott opened 1 month ago
0
Add a versioned dependency to `meds` in our `pyproject.toml` that is frozen at `0.3` so that we can clearly indicate compatability with which versions of `meds`.

#59 mmcdermott closed 1 month ago
0
Updating how we handle patient splits to use `metadata/patient_splits.parquet`

#58 mmcdermott closed 1 month ago
3
Updating path inputs and outputs to be consistent with MEDS v0.3 (`final_cohort` -> `data`, store all MEDS-Tab outputs in an output cohort directory instead of the input MEDS cohort directory so we don't overwrite MEDS cohort metadata).

#57 mmcdermott closed 1 month ago
0
Updating column names and such to be consistent with MEDS v0.3 for the data schema (`timestamp` -> `time`, `numerical_value` -> `numeric_value`)

#56 mmcdermott closed 1 month ago
7
We need to update this to support MEDS v0.3

#55 mmcdermott opened 1 month ago
1
We may need to support working with MEDS datasets that have "code modifier" columns

#54 mmcdermott opened 2 months ago
0
Add wrapper code to ease loading model and corresponding aggregations into a jupyter notebook or user defined script

#53 Oufattole opened 2 months ago
0
Change the xgboost sweeper to optimize the binary inclusion of each aggregation x window combination.

#52 Oufattole opened 3 months ago
1
generate-permutations is misnamed

#51 EthanSteinberg closed 2 months ago
1
Extreme low memory tabularization.

#50 Oufattole opened 3 months ago
0
Support Generating Task specific tabularization without tabularizaing all events.

#49 Oufattole opened 3 months ago
0
Find a way to not need to duplicate static features for every event of the patient's data in files stored on disk.

#48 mmcdermott opened 3 months ago
0
Merge Boost Branch

#47 Oufattole opened 3 months ago
0
Select features based on strongest correlation with outcome or some other criteria (e.g., `select_k_best` in `sklearn`)

#46 mmcdermott opened 3 months ago
0
Windows progressing into the patient's future as well as past (these are valuable for profiling populations).

#45 mmcdermott opened 3 months ago
0