citations - Githubissues

ljmartin commented 4 years ago

An example of debiasing with ECFP and then comparing other fingerprints: LIT-PCBA: An unbiased dataset for machine learning and virtual screening.Viet-Khoa Tran-Nguyen, Célien Jacquemard, and Didier Rognan

ljmartin commented 4 years ago

using balanced datasets:Enhanced HTS Hit Selection via a Local Hit Rate Analysis

ljmartin commented 4 years ago

convert IC50 and free energy: D3R Grand Challenge 2: blind prediction of protein–ligand poses, affinity rankings, and relative binding free energies

ljmartin commented 4 years ago

debiasing with ECFP and comparing docking scores: HiddenbiasintheDUD-Edatasetleadstomisleadingperformanceofdeeplearninginstructure-basedvirtualscreening

ljmartin commented 4 years ago

another use of AU(PR) Practical Model Selection for Prospective Virtual Screening

ljmartin commented 4 years ago

These people use ECFP to debias (kullback leibler divergence) for docking: Combining docking pose rank and structure with deep learning improvesprotein-ligand binding mode prediction over a baseline docking approach

ljmartin commented 4 years ago

These folks use 'leave-class-out' based on ECFP, assumption being that once the class is left out it is always left out: Evaluation of Cross-Validation Strategies in Sequence-BasedBinding Prediction Using Deep Learning

also they cite this for leave-class-out Prediction of Human Volume of Distribution Values for Neutral and Basic Drugs. 2. Extended Data Set and Leave-Class-Out Statistics

ljmartin commented 4 years ago

why to use AUPRC: The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets

(from Dataset Augmentation Allows Deep Learning-Based Virtual Screening To Better Generalize To Unseen Target Classes, And Highlight Important Binding Interactions)

ljmartin commented 4 years ago

a different alternative to debiasing. activity quantile bootstrapping: A decision-theoretic approach to the evaluation of machine learning algorithms in computational drug discovery

ljmartin commented 4 years ago

This paper nicely outlines why AVE measures both bias AND 'easyness': Development of new methods needs proper evaluation – benchmarkingsets for machine learning experiments for class A GPCRs

ljmartin commented 4 years ago

Keiser, use of time-split: https://www.biorxiv.org/content/10.1101/2020.05.21.107748v1.full

and also cites a paper showing 10:1 imbalance is optimal: The Influence of the Negative-Positive Ratio and Screening Database Size on the Performance of Machine Learning-Based Virtual Screening

ljmartin commented 4 years ago

counterexample: Machine Learning Models to Predict Inhibition ofthe Bile Salt Export Pump THey calculate AVE with Mordred FPs

ljmartin commented 4 years ago

similar dataset preparation: Scope of 3D Shape-Based Approaches in Predicting theMacromolecular Targets of Structurally Complex Small MoleculesIncluding Natural Products and Macrocyclic Ligands

ljmartin commented 4 years ago

Including 'color' improves lbvs: How To Optimize Shape-Based Virtual Screening: Choosing the Right Query and Including Chemical Information

ljmartin commented 4 years ago

Good , intuitive article on blocked CV: https://onlinelibrary.wiley.com/doi/10.1111/ecog.02881 Cross‐validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure

ljmartin commented 4 years ago

for separate paper on class separability: The Two-Point Correlation Function: A Measure of Interclass Separability

ljmartin commented 4 years ago

some work on separability: How Complex is your classification problem?A survey on measuring classification complexity

A Probabilistic Approach to Nearest-NeighborClassification: Naive Hubness BayesiankNN

Hubness-aware shared neighbor distances for high-dimensional 𝑘 -nearest neighbor classification

Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data∗

ljmartin / fp_generalizability_revision

citations #1