This changeset adds built-in PTM pipeline which draws sites from multiple databases, maps sites to alternative isoforms (using exact match of +/-7 amino acid sequence span) and verifies data consistency.
There are predefined site importers (descendants from SiteImporter class) for four supported databases: PhosphoSitePlus, HPRD, Phospho.ELM and UniProt. Files (or database dumps) for those databases have to be provided by the user.
All site importers are defined as classes in Python modules in the imports.sites package and all come with tests (test_imports.ptm_sites).
To import sites from a given database following script shall be used:
./manage.py load protein_related -i ImporterName
Names of the provided importers are: HPRDImporter, OthersUniprotImporter, GlycosylationUniprotImporter, PhosphoSitePlusImporter, PhosphoELMImporter.
As UniProt divides PTM sites into four categories:
lipids (lipidations)
glycans (glycosylation, glycation)
cross-links (ubiquitination, sumoylation, etc.)
all others (phosphorylation, methylation, acetylation, etc.)
there are two importers for sites retrieved from UniProt (for the second and the last category).
For backward compatibility, it is still possible to import sites from a file generated with reimandlab/PTMvar scripts using additional importer: PTMVarImporter, though this one is not exposed as it should be considered deprecated.
There are various major maintenance-related changes in this pull-request, including:
refactorization of two modules which grow big over time: database.py and stats.py into Python packages (which reduced complexity a lot)
creation of a base class Importer which is now shared by all import scripts
creation of a new "analyses" package with enrichment and ActiveDriver analyses (in part executed with R using rpy2 bridging capabilities)
The last group of commits in this PR represents addition of Venn diagrams generation which enable quick assessment of the "added value" of various databases (how much entities are unique/shared among databases).
This changeset adds built-in PTM pipeline which draws sites from multiple databases, maps sites to alternative isoforms (using exact match of +/-7 amino acid sequence span) and verifies data consistency.
There are predefined site importers (descendants from
SiteImporter
class) for four supported databases: PhosphoSitePlus, HPRD, Phospho.ELM and UniProt. Files (or database dumps) for those databases have to be provided by the user.All site importers are defined as classes in Python modules in the
imports.sites
package and all come with tests (test_imports.ptm_sites
).To import sites from a given database following script shall be used:
Names of the provided importers are: HPRDImporter, OthersUniprotImporter, GlycosylationUniprotImporter, PhosphoSitePlusImporter, PhosphoELMImporter.
As UniProt divides PTM sites into four categories:
there are two importers for sites retrieved from UniProt (for the second and the last category).
For backward compatibility, it is still possible to import sites from a file generated with reimandlab/PTMvar scripts using additional importer:
PTMVarImporter
, though this one is not exposed as it should be considered deprecated.There are various major maintenance-related changes in this pull-request, including:
Importer
which is now shared by all import scriptsThe last group of commits in this PR represents addition of Venn diagrams generation which enable quick assessment of the "added value" of various databases (how much entities are unique/shared among databases).