Closed jgriss closed 7 years ago
Hi Johannes, I just ran all unittests for Python 3 and fixed a few issues, maybe you should pull the current version.
Since it is not completely clear to me, what kind of data you have, I'll just post one possible example. Here you have a list of observed peptides and don't know how they are mapped to a protein database. A minimal set of proteins is given by all the group representative proteins. However, sometimes the group has multiple leading proteins with identical evidence. The algorithm implemented in mappingBasedGrouping() chooses the first protein of an alphanumberical sort. For details about the grouping procedure you could have a look at the module documentation, or ask here.
from collections import defaultdict as ddict
import maspy.inference as INFERENCE
import maspy._proteindb_refactoring as PROTEINDB
#Import your proteindb file
fastaPath = 'path/filename.fasta'
importAttributes = {}
proteindb = PROTEINDB.importProteinDatabase(fastaPath, **importAttributes )
#Observed peptides that are used for protein inference / grouping
observedPeptides = set()
#This could be automated by adding a function to the inference module
proteinToPeptides = ddict(set)
for peptide in observedPeptides:
proteins = proteindb.peptides[peptide].proteins
for protein in proteins:
proteinToPeptides[protein].add(peptide)
#Generate the ProteinInference instance
inference = INFERENCE.mappingBasedGrouping(proteinToPeptides)
#Get a minimal protein set
minimalProteinSet = set()
for groupId in inference.groups:
minimalProteinSet.add(inference.groups[groupId].representative)
To choose the representative yourself if multiple leading proteins are present you could do something like this:
minimalProteinSet = set()
for groupId in inference.groups:
leadingProteins = inference.groups[groupId].leading
if len(leadingProteins) > 1:
representative = yourSelectionFunction(leadingProteins)
else:
representative = inference.groups[groupId].representative
minimalProteinSet.add(representative )
Does this answer your question?
Hi David!
Thanks a lot for the detailed response! That's actually more than I needed - I already figured out the mapping part based on your previous explanations.
Thanks a lot again!
I'll keep on testing your great package!
Hi David,
Could you maybe give me a short example of how to get the minimal set of proteins that can explain a set of peptides?
Thanks for the help!
Cheers, Johannes