lehtiolab / ddamsproteomics

A Nextflow MS DDA proteomics pipeline
MIT License
3 stars 5 forks source link

Create user-defined column(s) with flags for occurance of particular protein type, contaminant etc #5

Closed glormph closed 9 months ago

glormph commented 1 year ago

From Konstantin: I am wondering if it would be possible to add an option for annotation columns in the results output in Kantele. The idea comes from Proteome Discoverer where I have been using this function quite often. The way it is done in PD is through an annotation node where the user can specify a FASTA file for any list of proteins of interest - e.g., common contaminants, IP baits, biological pathway members, or STRING interactors. The node then checks if the identified peptides and proteins are found in each of the user-provided FASTA databases and outputs a simple logical flag in the respective columns with user-defined names. Having such columns directly in the search output is very convenient because it allows one very easily sort and filter the results by those columns - for example, one could immediately check if the bait and its known interactors have been successfully identified in an IPMS experiment, or filter out the known contaminants. It could be achieved by providing a list of accession numbers, too. But using a FASTA is potentially more flexible because it allows one to take, e.g., a FASTA retrieved from STRING with Ensembl accession numbers and still match against the result of a search against SwissProt or UniProt. Also, PD adds the marker columns in the peptide and PSM tables, which is sometimes quite useful to check if all peptides attributed to a given protein group have been identified in all the samples or only in some. For example, a tagged version of an endogenous protein is often used as a bait in IPMS so that the peptides attributed to the endogenous protein are a subset of the tagged version. In this case, at the protein group level, the bait is often found in both the IP and the negative control, but at the peptide level one can often see that the peptides mapped to the tag are identified exclusively in the IP samples.