Closed jaeho3690 closed 3 years ago
The qualitative / quantitative distinction is coming from IEDB. Quantitative data is from assays that give an ic50 (nanomolar affinity). For these assays we just train the models to predict these affinities. Qualitative data is from assays that don't give a nanomolar affinity readout but rather just "positive", "strong", "negative", etc. This is where the measurement inequality comes in - we convert measurements like "positive" to inequalities, like "< 100 nM", meaning that the model will be penalized at training time for any prediction over 100 nM (i.e. weaker than 100 nM) for that peptide, but incur 0 loss if it gives a prediction less than 100 nM.
The conversion from qualitative to quantitative inequalities is defined here: https://github.com/openvax/mhcflurry/blob/master/downloads-generation/data_curated/curate.py#L67
The method we use for handling measurement_inequality during training time is described in the mhcflurry version 1 paper, see "Training" subsection of "Method details": https://www.sciencedirect.com/science/article/pii/S2405471218302321?via%3Dihub
Hope that helps.
Thank you for your kind response. May I also ask how you evaluated the S1.csv? There is 6 allele per peptide. Is it possible to know which one of the six alleles is the one that bind with the peptide?
It's not possible to know - in general what we have done in our benchmarks is take the strongest affinity across all 6 alleles.
I've been trying to reproduce your work. However, there are some questions that cannot be answered from the paper itself.
In dataset s3, what is the difference between qualitative and quantitative?
Moreover, would you be able to elaborate on the measurement_inequality as well?
Thank you!