greenelab / mpmp

Multimodal Pan-cancer Mutation Prediction
BSD 3-Clause "New" or "Revised" License
7 stars 6 forks source link

Caching for cross-data sample set filtering #30

Closed jjc2718 closed 3 years ago

jjc2718 commented 3 years ago

See here: https://github.com/greenelab/mpmp/pull/29#discussion_r590807995

It should be possible to cache the set of samples that different datasets have in common somewhere, which would likely be faster than calculating the intersection from the sample_info files each time.

jjc2718 commented 3 years ago

I don't think this is particularly slow relative to the rest of the mutation prediction process, so I'm going to close this. If this becomes a bottleneck in the future we can revisit, but I don't foresee this being a high priority.