snijderlab / rustyms

A rust library for parsing Pro Forma peptides and matching them against MS spectra
Apache License 2.0
13 stars 2 forks source link

Question: mass shift modification parsing #40

Open singjc opened 2 days ago

singjc commented 2 days ago

Hi,

I am interested in using the rustyms library to extract information from modified peptides strings. If I have the following modified peptide seuqence (with a mass shift): "KDM[+15.9949]YGLQAEME". Would it be possible read this string and then derive the type of modfication and the amino acid, "Oxidation@M"?

Best,

Justin

douweschulte commented 2 days ago

Hi Justin. If you use the provided ProForma parsers you will end up with a modification of that specified mass. Changing this mass modification to a database modification (Oxidation from Unimod) is something I have some internal code for (also used for reading of identified peptides files) so I could make that function public. Doing it this way will have a very minor effect on the peptide though as the mass is close enough to the monoisotopic mass that most functions will already work fine.

If you are interested here is the current function, I am thinking of adding the tolerance as a parameter and then making it public.

/// Look at the provided modifications and see if they match any modification on this peptide with
/// more information and replace those. Replaces any mass modification within 0.1 Da or any precise
/// matching formula with the provided modifications.
pub(crate) fn inject_modifications(&mut self, modifications: &[SimpleModification]) {}
singjc commented 1 day ago

Hi Douwe,

Thank you for the info, the LinearPeptide ProForma parser works great!

I think it would be great if you add a tolerance and make the inject_modifications method public, that would be useful!

I played around with the LinearPeptide parser to get the desired output I want. Not sure if the code is optimal or could be written better, but it seems to work.

fn get_modification_name_site(
      mod_search: ModificationSearchResult,
      modified_aa: AminoAcid,
  ) -> Option<String> {
      match mod_search {
          ModificationSearchResult::Mass(_, _, matches) => {
              matches
                  .into_iter()
                  .find_map(
                      |(ontology, _, _, modification)| match (ontology, modification) {
                          (
                              Ontology::Unimod,
                              Modification::Predefined(_, specificities, _, psims_name, _),
                          ) => specificities.iter().find_map(|specificity| {
                              if let PlacementRule::AminoAcid(amino_acids, _) = specificity {
                                  if amino_acids.contains(&modified_aa) {
                                      Some(format!("{}@{}", psims_name, modified_aa.char()))
                                  } else {
                                      None
                                  }
                              } else {
                                  None
                              }
                          }),
                          _ => None,
                      },
                  )
          }
          _ => None,
      }
  }

fn get_all_modifications(peptide: &LinearPeptide, tol_ppm: f64) -> String {
    let tol = Tolerance::new_ppm(tol_ppm);

    peptide
        .sequence
        .iter()
        .enumerate()
        .filter_map(|(_, sequence_element)| {
            sequence_element.modifications.first().and_then(|mod_mass| {
                let modified_aa = sequence_element.aminoacid;
                let mod_search = Modification::search(mod_mass, tol);

                get_modification_name_site(mod_search, modified_aa)
            })
        })
        .collect::<Vec<String>>()
        .join(";")
}

Use

let peptide_str = "MSFNELT[79.9663]ESNKKSLM[+15.9949]E";
let peptide = LinearPeptide::pro_forma(peptide_str).unwrap();
let result = get_all_modifications(&peptide, 1.0);
println!(
    "Found the following modifications {} for {}",
    result, peptide_str
);

output

Found the following modifications Phospho@T;Oxidation@M for MSFNELT[79.9663]ESNKKSLM[+15.9949]E