greenelab / mpmp

Multimodal Pan-cancer Mutation Prediction
BSD 3-Clause "New" or "Revised" License
7 stars 6 forks source link

Reindex in data model rather than in classification code? #31

Closed jjc2718 closed 3 years ago

jjc2718 commented 3 years ago

See here: https://github.com/greenelab/mpmp/blob/78128e6325455d717865925d9b2cacab5db2e25b/mpmp/utilities/classify_utilities.py#L90

This reindexing to make data and labels have the same set of samples (indexes) could happen in the data model rather than the classification code. This would probably make more sense, especially once prediction/model-fitting is abstracted into separate classification and regression scripts.

jjc2718 commented 3 years ago

I think this (somewhat inadvertently) got addressed with the sample overlap changes (e.g. in #43) : when we take the sample overlap between data types and reindex in the tcga_data_model, it should also align the X_df and y_df.

Closing this for now.