mims-harvard / TDC

Therapeutics Commons (TDC-2): Multimodal Foundation for Therapeutic Science
https://tdcommons.ai
MIT License
984 stars 173 forks source link

Support strict split for drug synergy prediction task #136

Closed ZOE-V closed 2 years ago

ZOE-V commented 2 years ago

Describe the bug The current splitting strategies we can use on the available datasets for drug synergy prediction tasks are limited to:

Additional information This issue was already addressed for the drug response prediction task by @jannisborn: [https://github.com/mims-harvard/TDC/issues/126].

Idea of solution A functionality like for the drug response prediction task. For example: split = synergy_data.get_split(method = 'cold_split', column_names = ['Drug1_ID', 'Drug2_ID', 'Cell_Line_ID']) (Which currently yields to AttributeError: Please select from random_split, or cold_split, if cold split. please specify the column name!) I believe that the issue is related to the conditions on line 96 in https://github.com/mims-harvard/TDC/blob/main/tdc/multi_pred/multi_pred_dataset.py. I suggest to replace lines 96-97-98 by something like:

#cast to list if needed
if isinstance(column_name, str):
    column_name = [column_name]

#test whether all columns are present in df
if (column_name is not None) and all([x in df.columns.values for x in column_names]): 
    if method == 'cold_split':      
        return create_fold_setting_cold(df, seed, frac, column_name)

Do you believe this would be a valuable contribution to the package?

Thank you for your help.

Bests,

Zoe

kexinhuang12345 commented 2 years ago

Hi Zoe! I think this would be great! I would suggest editing to something Jannis made to the bi_pred_dataset.py class:

https://github.com/mims-harvard/TDC/blob/2f605c92a4c918ed0972cb2a35c809f2e4c8468b/tdc/multi_pred/bi_pred_dataset.py#L174-L183

Feel free to make a PR! Thanks in advance!