Closed malachig closed 5 years ago
Avoid peptides beginning with Q and ending with C or P.
Issues relating to solubility and hydrophobicity could also be considered here since they are related to the peptide sequence and length. The register or length may need to be adjusted to optimize the solubility. We could also consider stability in this step.
I was able to isolate the vaxrank code that calculates the manufacturability scores. As an example, for peptide SLASTLSPSCRTTPQFFPSSYLPPH
this would return:
cterm_7mer_gravy_score=-0.7857142857142858,
max_7mer_gravy_score=0.9,
difficult_n_terminal_residue=False,
c_terminal_cysteine=False,
c_terminal_proline=False,
cysteine_count=1,
n_terminal_asparagine=False,
asparagine_proline_bond_count=0
There is also some code that prioritizes each of them (https://github.com/openvax/vaxrank/blob/fa368b6a7d9de9ff468f2df547fa4fae6b33e861/vaxrank/vaccine_peptide.py#L73), which for the example peptide returns this tuple:
(1, 0, 0, False, False, False, False, 0, 0, 0)
As far as I can tell, this is used for sorting peptides downstream. So I think we need to decide how to use these metrics to determine whether or not to clip the peptide.
Interim solution: output the 8 metric to the filtered result files after running stability and cleavage site predictions
I'm closing this issue since the interim solution of outputting the 8 metrics has been implemented in #383. We can make a new issue for further development.
It would be helpful to have a tool that helped end users optimize selection of long peptide sequences. Specifically for those who are making long peptide vaccines.
e.g. Imagine you have a 9-mer that you like. You want to choose a 22-25mer long peptide that contains this 9-mer. We have some wiggle room for both the length and sub-peptide position we chose when submitting to the manufacturer. Certain characteristics need to be avoided and this can be done by adjusting either the length or register. For example, a long peptide should not start or end with a Proline and Glutamine. Also in general you want a longer peptide because it increases solubility on average. However, this must be balanced against an increased synthesis failure rate for longer peptides. Right now people are looking at these sequences and picking them manually to account for these factors. In this step we may incorporate a more sophisticated calculation of manufacturability (e.g. like in vaxrank (https://github.com/openvax/vaxrank/issues/2)).
Obviously we would need to refine the details more here. This is just to get the discussion rolling.