learningmatter-mit / peptimizer

Peptide optimization with Machine Learning
68 stars 24 forks source link

Blank in the optimized sequences #5

Closed sunghunbae closed 2 years ago

sunghunbae commented 2 years ago

The optimization generates a lot of sequences with ' '(blank) at the C-terminal end. In the below example, a blank character follows ...GKK in the first sequence. This blank is counted as an amino acid sequence and its length is 10 while the second sequence does not have the blank and its length is 9. I am wondering if the blank affects the intensity.

sequences,intensity,length,relative_Arg,relative_charge KKPCHHGKK ,6.252917870132182,10,0.0,0.554286151848062 KKPCHHMKK,6.177834318018949,9,0.0,0.6158735020534022

pikulsomesh commented 2 years ago

The blank or whitespace does not affect the intensity, as the fingerprint for whitespace is a vector of 0s, similar to the one used for padding.