uclahs-cds / package-moPepGen

Multi-Omics Peptide Generator
https://uclahs-cds.github.io/package-moPepGen/
GNU General Public License v2.0
6 stars 1 forks source link

Does mpg support no cleavage? #609

Open lydiayliu opened 2 years ago

lydiayliu commented 2 years ago

But for immunopeptidome, we known those peptides have affinity to HLA, so moPepGen would be a perfect fit for generating custom DB to search for it.

I've been wondering about what you said here. I think some tweaks are required for mpg to generate neoantigens. I just realized that I used trypsin as the enzyme for calling neoantigens. Even though MHCFlurry can chop the peptides generated by mpg into arbitrary shorter peptides (could be bad if the fragments dont actually contain the variant), it would probably be more ideal to have a "no cleavage" or "native peptides" mode for mpg (and turn off the chopping function in MHCFlurry if possible :P)

zhuchcn commented 2 years ago

Not impossible. Instead of creating the peptide cleavage graph, we can create a peptide kmer graph. But this would need a lot of work, because we'll have to create a new model for this kind of graph.

lydiayliu commented 2 years ago

Why don't we just forget about the cleavage graph and go straight from the peptide variant graph? Traverse the graph for each length of peptide, or each "frame" of peptide

zhuchcn commented 2 years ago

I think that should work! We then don't need to worry about the miscleavages. Just need to write a new traverser.

hsiaoyi0504 commented 1 month ago

Is there any update for usage of generating customized database for immunopeptidome?

zhuchcn commented 1 month ago

Thanks for being interested in moPepGen @hsiaoyi0504. My plan now is generating non-canonical peptides with at least 1 variant event, and with up to X number of consecutive reference amino acids. X will be the max length of peptides you would usually expect from your MS data. This should capture all the possible non-canonical peptides, but will require some extensive work.

hsiaoyi0504 commented 2 weeks ago

Or instead of getting peptides from callVariant, can we also have a function to get the protein fasta directly? I think proteomics search software could generate non redundant database based on that as well.

zhuchcn commented 2 weeks ago

Along with the FASTA file, callVariant generates a table containing the information per peptide entry, such as the amino acids carrying variant(s). So technically you can take the table and generate all peptides of 8-11 length harboring any variant. But note that depending on how this is done, some variant combinations may be lost.

I have a (still active) branch czhu-feat-no-cleavage for generating non-canonical peptides with no enzyme. Feel free to try it but I haven't run any fuzz test on it so it is not guaranteed to work.