wfondrie / depthcharge

A deep learning toolkit for mass spectrometry
https://wfondrie.github.io/depthcharge/
Apache License 2.0
59 stars 18 forks source link

Why there is a replacement from "I" to "L" in peptide sequence during the tokenize function of _PeptideTransformer in transformer.py? #24

Closed liangzhendong123 closed 1 year ago

liangzhendong123 commented 1 year ago

Hi, casanovo and depthcharge are excellent work. However, there is some confuse in transformer.py. Why there is a replacement from "I" to "L" in peptide sequence during the tokenize function of _PeptideTransformer? Will this replacement influence the predict precusion of peptide sequence incuding "I" during inference?

wfondrie commented 1 year ago

Hi @liangzhendong123 👋 - thanks for checking our depthcharge and casanovo!

Why there is a replacement from "I" to "L" in peptide sequence during the tokenize function of _PeptideTransformer?

This choice was made because "I" is indistinguishable from "L" in mass spectrometry applications. Both I and L have the same mass and the small difference in their structures appear to have a negligible affect on the fragment ion intensities we observe in a mass spectrum. In hindsight however, I should probably make this an optional default instead.

Will this replacement influence the predict precusion of peptide sequence incuding "I" during inference?

Yes, but this is true either implicitly or explicitly for all mass spectrometry-based de novo peptide sequencing approaches.