jorvis / biocode

Bioinformatics code libraries and scripts
MIT License
504 stars 247 forks source link

Add motif predictions to parse_ergatis_euk_functional_pipeline.py #11

Open mchibucos opened 10 years ago

mchibucos commented 10 years ago

The euk functional annotation script (sandbox/jorvis/parse_ergatis_euk_functional_pipeline.py) might be augmented with some additional evidence. I propose adding the following predictions: SignalP SecretomeP TMHMM TargetP (More information can be found here: http://www.cbs.dtu.dk/services/ and there are additional prediction tools there, as well.)

With respect to how to handle the annotation name in column 9 of the GFF3 file, I propose adding information to those names that would otherwise be "Hypothetical protein" due to lack of significant matches to other evidence (e.g. no named BLAST hits from UniProt, nor any HMM results). For example, if a protein is putatively secreted, but otherwise has no annotation, we might call it "Hypothetical secreted protein", and if a protein localizes to the membrane, it could be called "Hypothetical transmembrane protein".

For database submissions, this might not be useful (as GenBank would reject annotations following such nomenclature), but we could parse those prior to submission to GenBank. (For example, all proteins called "Hypothetical" followed by any other text would be renamed "Hypothetical protein".