Manipulating and exploring protein and proteomics data.
From github using devtools::install_github
:
library("devtools")
install_github("ComputationalProteomicsUnit/Pbase")
See the DESCRIPTION
file for a complete list.
Currently, the best way to get started is ?Proteins
and the
Pbase-data
vignette. More documentation is on its way.
Pbase
is under heavy development and is likely to considerably
change in the near future. Suggestion and bug reports are welcome and
can be filed as
github issues.
If you would like to contribute, please directly send pull requests for minor contributions and typos. For major contributions, we suggest to first get in touch with the package maintainers.
Given a protein fasta file, what is the maximal sensitivity that can be expected from a mass spectrometry experiment with 0, 1, ... miscleavages. This should probably also include a filtering step for peptide flyability.
Some literature about estimating detectability:
Requirements for in-silico created peptides: missedCleavages = 0:2
, length(peptides) >= 6
, mass(peptides) < 6000
(Da)
Logistic Regression based on Hydrophobicity, Isoelectric point, length, molecular weight, average hydrophobicity, average isoelectric point
Requirements for in-silico created peptides: missedCleavages = 0:2
, length(peptides) >= 6
, mass(peptides) < 6000
(Da)
35 features: length, weidght, # of (non-)polar, # of (un)charged, # of pos./neg. charged residues, hydrophobicity (different models), polarity (different models), bulkiness, AA singlet counts
Requirements for in-silico created peptides: length(peptides) >= 6
Features: Length, Charge, Isoelectric Point, Molecular Weight, Hydropathicity, Counts of each AA (20 Features), Percent composition of each AA (20 Features), Percent of polar, psoitive, negative, hydrophobic AA
take-home-message: a model of one species/dataset could not be transfered to another dataset (without dramatically decreasing the performance)
~1000 Features.
Some of the most discriminating properties: Total/Average net/positive charge, hydrophobic moment, isoelectric point, Histidine composition
take-home-message: The model of one species is comparable to another if the evolutionary distance is small (e.g. yeast and human) but you can't compare different devices/datasets (e.g. MALDI vs ESI)
Mass: 500:4500
http://www.nature.com/nbt/journal/v25/n1/extref/nbt1275-S5.pdf http://ieeexplore.ieee.org/ielx5/5779756/5779971/5780167/html/img/5780167-fig-1-large.gif
Length: 5:40
http://www.nature.com/nbt/journal/v25/n1/extref/nbt1275-S6.pdf http://ieeexplore.ieee.org/ielx5/5779756/5779971/5780167/html/img/5780167-fig-1-large.gif
95% of all peptides are of length 5:30
:
http://www.nature.com/nbt/journal/v25/n1/extref/nbt1275-S24.pdf
Average Isoelectric point: seq(0, 1.4)
http://ieeexplore.ieee.org/ielx5/5779756/5779971/5780167/html/img/5780167-fig-1-large.gif
http://web.expasy.org/tools/protparam/protparam-doc.html http://web.expasy.org/compute_pi/pi_tool-doc.html Kyte, Jack, and Russell F. Doolittle. "A simple method for displaying the hydropathic character of a protein." Journal of molecular biology 157.1 (1982): 105-132.
See Pavel's idea.
Available through the integration with the EnsmbleDb
package. See the Pbase-with-ensembldb
vignette.
See the mapping
vignette.
See also this document for additional examples and integration with RNA-seq data.
The package allows to easily interact with AAString
and
AAStringSet
instances, protein databases such as UniProt (and
possibly biomaRt in the future) using protein identifiers, protein
identification results (mzID
or (devel) mzR
packages) and possibly
also MSnExp
and MSnSet
instances.