hollenstein / maspy

An open-source python library for mass spectrometry-based proteomics data analysis
Apache License 2.0
3 stars 0 forks source link

Examples for protein inference #2

Closed jgriss closed 7 years ago

jgriss commented 7 years ago

Hi David,

Could you maybe give me a short example of how to get the minimal set of proteins that can explain a set of peptides?

Thanks for the help!

Cheers, Johannes

hollenstein commented 7 years ago

Hi Johannes, I just ran all unittests for Python 3 and fixed a few issues, maybe you should pull the current version.

Since it is not completely clear to me, what kind of data you have, I'll just post one possible example. Here you have a list of observed peptides and don't know how they are mapped to a protein database. A minimal set of proteins is given by all the group representative proteins. However, sometimes the group has multiple leading proteins with identical evidence. The algorithm implemented in mappingBasedGrouping() chooses the first protein of an alphanumberical sort. For details about the grouping procedure you could have a look at the module documentation, or ask here.

from collections import defaultdict as ddict

import maspy.inference as INFERENCE
import maspy._proteindb_refactoring as PROTEINDB

#Import your proteindb file
fastaPath = 'path/filename.fasta'
importAttributes = {}
proteindb = PROTEINDB.importProteinDatabase(fastaPath, **importAttributes )

#Observed peptides that are used for protein inference / grouping
observedPeptides = set()

#This could be automated by adding a function to the inference module
proteinToPeptides = ddict(set)
for peptide in observedPeptides:
    proteins = proteindb.peptides[peptide].proteins
    for protein in proteins:
        proteinToPeptides[protein].add(peptide)

#Generate the ProteinInference instance
inference = INFERENCE.mappingBasedGrouping(proteinToPeptides)

#Get a minimal protein set
minimalProteinSet = set()
for groupId in inference.groups:
    minimalProteinSet.add(inference.groups[groupId].representative)

To choose the representative yourself if multiple leading proteins are present you could do something like this:

minimalProteinSet = set()
for groupId in inference.groups:
    leadingProteins = inference.groups[groupId].leading
    if len(leadingProteins) > 1:
        representative = yourSelectionFunction(leadingProteins)
    else:
        representative = inference.groups[groupId].representative
    minimalProteinSet.add(representative )

Does this answer your question?

jgriss commented 7 years ago

Hi David!

Thanks a lot for the detailed response! That's actually more than I needed - I already figured out the mapping part based on your previous explanations.

Thanks a lot again!

I'll keep on testing your great package!