griffithlab / pVACtools

http://www.pvactools.org
BSD 3-Clause Clear License
141 stars 59 forks source link

Exclude epitopes with problematic amino acids #850

Closed susannasiebert closed 1 year ago

susannasiebert commented 2 years ago

This would implement a new step to remove epitopes in the all_epitopes file that contain a problematic amino acid. Problematic amino acids would be provided via a new input (name TBD).

@m-two @gschang Should we allow any amino acid to be excluded here or is there at least a short list of amino acids that are usually known to be problematic?

gschang commented 2 years ago

As Mike commented, cysteine residue is reported to cause peptide synthesis failure in Gillanders' peptide-vaccine clinical trials.

We're open for this discussion. In my opinion, a new option for any customized list of amino acids would be recommended because this list would be specific to each vendor for peptide vaccine design as Mike mentioned. We can make a new column to indicate the presence of such exclusion amino acids (and their position as well?) in the final output, rather than hard filtering-out of those epitopes.

susannasiebert commented 2 years ago

I like the idea of not hard-filtering the all_epitopes file. What do you think of adding a filter step downstream to remove such epitopes from the filtered tsv and also exclude them from the aggregate report/pVACview.

gschang commented 2 years ago

I don't have clear idea yet. Let's keep discussing how to practically implement this tool in pVACtools.

I think, eventually, those epitopes with problematic amino acids should be reviewed via the pVACview interface by immunogenomics board at this moment. We're just starting discussion, for example, what if those epitopes are associated with a key driver mutation in the cancer patient?

Mike (@m-two) would also comment any idea or suggestion on this issue.

chrisamiller commented 2 years ago

I agree with the idea of a parameter to exclude certain amino-acids. --amino-acids-to-exclude CRG

m-two commented 2 years ago

For the JLF project and Gillanders projects moving forward they want to filter all epitopes with Cysteine residues in the peptide sequence.

gschang commented 2 years ago

Mike provided useful information on this issue. I copied Mike's comments below from another channel.

The amino acids listed below are good examples for the new --amino-acids-to-exclude CRG option (which Chris suggests above). This exclusion set of amino acids would be custom depending on the projects and design protocols.

----- (Below source from Mike) -----

Publication:

(1) Factors affecting the physical stability (aggregation) of peptide therapeutics https://royalsocietypublishing.org/doi/10.1098/rsfs.2017.0030

(2) Peptide Stability and Degradation https://biopeptek.com/peptide-resources/peptide-handling/peptide-stability-and-degradation/

**Peptide Stability and Potential Degradation Pathways**

1. Hydrolysis
> Peptides containing Asp (D)
> Sequence contains Asp-Pro (D-P)
> Similarly, if Asp-Gly (D-G) is present in the sequence
> Sequences containing Ser (S)

2. Deamidation sequences containing:
> Asn-Gly (N-G)
> Gln-Gly (Q-G)
> Asp-Gly (D-G)

3. Oxidation
> The Cys (C) and Met (M)

4. Diketopiperazine and pyroglutamic acid formation
> Gly (G) is in the third position from the N-terminus
> Pro (P) or Gly (G) is in position 1 or 2
m-two commented 2 years ago

I did a quick lit search. There are specific positions in the peptide sequences and specific pairs that are more prone to degradation. The input should probably allow for users to define two attributes "residue position + aa seq". The position could be "+" (from right/ N-terminus) or "-" (from left /C-terminus). If the position is undefined then all positions would be considered for filtering.

  1. Diketopiperazine and pyroglutamic acid formation

    Gly (G) is in the third position from the N-terminus

Input Suggestion:

Gly:3

Pro (P) or Gly (G) is in position 1 or 2

Input Suggestion:

Pro:1 Pro:2 Gly:1 Gly:2

m-two commented 2 years ago

This method could also be used to identify and remove any toxic peptide sequences. I know the FDA has asked some specific questions about this in the past. I finally found a source for toxic peptides here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3772798/ Dataset Creation We extracted small toxins (proteins/peptides) from different databases and studies that include ATDB [15], Arachno-Server [19], Conoserver [20], DBETH [16], BTXpred [17], NTXpred [18], and SwissProt [21]. We removed all proteins/peptides having more than 35 residues or any non-natural amino acid. As a result, 1805 unique toxic proteins/peptides were obtained. By employing the similar criteria, toxic proteins/peptides were also searched in SwissProt database using keyword KW800 (keyword 800 stands for toxin as molecular functions). A total of 803 toxic proteins, having length less than 35 amino acids were obtained. It is possible that many toxic peptides obtained from various databases could also be present in SwissProt. Therefore, identical toxic proteins/peptides were removed and finally we got 303 unique toxic proteins/peptides from SwissProt.