openvax / mhcflurry

Peptide-MHC I binding affinity prediction
http://openvax.github.io/mhcflurry/
Apache License 2.0
193 stars 58 forks source link

Support arbitrary length input sequences #156

Closed iskandr closed 4 years ago

iskandr commented 4 years ago

Just hit this error trying to predict on mass spec detected MHC ligands:

ValueError("23 peptides have lengths outside of supported range [5, 15]: ...")

What do you think of scanning 15mer subsequences across longer sequences and returning the mean [0,1] score for each component of the predictor? Taking the average rather than max would nudge those values downward, which is desirable with longer sequences.

timodonnell commented 4 years ago

If we need support for these peptide lengths, it may make the most sense to just train the predictors using a max length of e.g. 17. Having an additional secondary way support longer peptide lengths seems complicated. But I'd be interested to know if your suggestion gives good accuracy if you try it out.

You can also pass throw=False by the way, if you just want to get NaNs for these peptides and not throw an error.

On Thu, Feb 6, 2020 at 1:54 PM Alex Rubinsteyn notifications@github.com wrote:

Just hit this error trying to predict on mass spec detected MHC ligands:

ValueError("23 peptides have lengths outside of supported range [5, 15]: ...")

What do you think of scanning 15mer subsequences across longer sequences and returning the mean [0,1] score for each component of the predictor? Taking the average rather than max would nudge those values downward, which is desirable with longer sequences.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/openvax/mhcflurry/issues/156?email_source=notifications&email_token=AADSHOGCFDHOUMSVJNCCVKDRBRMF3A5CNFSM4KRCE3PKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4ILTJQXA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADSHOCVXZYYAMJXUM54RILRBRMF3ANCNFSM4KRCE3PA .