support reading more than one type of affinity label, (i.e both Kd and IC50, or all four types)
support low and high thresholds for binary labeling, rather than just a single one
support consolidating duplicate samples using mean or maximum affinity
support returning target and ligand ids in addition to strings and labels (this is useful for interfacing with the Therapeutics Data Commons framework)
support both 'AND' and 'OR' conditions keeping samples which have ids for targets and/or ligands
The default argument values will keep the function behavior as before. I checked for backward compatibility by running:
added the following features:
The default argument values will keep the function behavior as before. I checked for backward compatibility by running:
X_drugs, X_targets, y = dataset.process_BindingDB(path = "./data", df = df, y = 'Kd', binary = False, convert_to_log = True, threshold = 30)
and made sure that the same results are obtained with the existing and proposed implementations.