Describe the bug
The current splitting strategies we can use on the available datasets for drug synergy prediction tasks are limited to:
the classic random split
combination split, such that test samples contain drug pairs unseen to the model after training
cold split which can only split by instances of one modality, but not on multiple modalities, such that test samples contain either drugs or cell lines unseen to the model after training
Hence, OncoPolyPharmacology and DrugComb datasets cannot be split such that test samples contain drugs and cell lines that are both unseen to the model after training.
Idea of solution
A functionality like for the drug response prediction task. For example:
split = synergy_data.get_split(method = 'cold_split', column_names = ['Drug1_ID', 'Drug2_ID', 'Cell_Line_ID'])
(Which currently yields to AttributeError: Please select from random_split, or cold_split, if cold split. please specify the column name!)
I believe that the issue is related to the conditions on line 96 in https://github.com/mims-harvard/TDC/blob/main/tdc/multi_pred/multi_pred_dataset.py. I suggest to replace lines 96-97-98 by something like:
#cast to list if needed
if isinstance(column_name, str):
column_name = [column_name]
#test whether all columns are present in df
if (column_name is not None) and all([x in df.columns.values for x in column_names]):
if method == 'cold_split':
return create_fold_setting_cold(df, seed, frac, column_name)
Do you believe this would be a valuable contribution to the package?
Describe the bug The current splitting strategies we can use on the available datasets for drug synergy prediction tasks are limited to:
OncoPolyPharmacology
andDrugComb
datasets cannot be split such that test samples contain drugs and cell lines that are both unseen to the model after training.Additional information This issue was already addressed for the drug response prediction task by @jannisborn: [https://github.com/mims-harvard/TDC/issues/126].
Idea of solution A functionality like for the drug response prediction task. For example:
split = synergy_data.get_split(method = 'cold_split', column_names = ['Drug1_ID', 'Drug2_ID', 'Cell_Line_ID'])
(Which currently yields to AttributeError: Please select from random_split, or cold_split, if cold split. please specify the column name!) I believe that the issue is related to the conditions on line 96 in https://github.com/mims-harvard/TDC/blob/main/tdc/multi_pred/multi_pred_dataset.py. I suggest to replace lines 96-97-98 by something like:Do you believe this would be a valuable contribution to the package?
Thank you for your help.
Bests,
Zoe