Open kifeonu opened 7 years ago
I ran functional annotation pipeline on protein sequences and then ran attributor to incorporate annotations in the headers of the pep fasta file. I have noticed 4, 5 and 7 issues (from Kemi's previous post) in my final output.
Had to edit the product names manually to correct issue 7.
Eg: filamentous hemagglutinin family N-terminal domain domain protein Methyltransferase FkbM domain family protein
/local/projects/aengine/organisms/Herve_pep_reannotation/pep_reannotation.faa
Looking at this while I have some free time on vacation. Can you send the path to Herve's full polypeptide fasta file?
@kabolude, thanks for the link to the curation script. Do you still have your source polypeptide fasta file I could test?
Hi Josh,
Here is the path to Herve’s polypeptide file, /local/projects/aengine/organisms/Herve_pep_reannotation/ORFs_inTables_toReBlast.pep
Just letting you know, these protein sequences are from GenBank, there is NCBI annotation for each polypeptide in the header. Herve wanted reannotation.
Thanks, Suvvi
From: Joshua Orvis [mailto:notifications@github.com] Sent: Wednesday, July 12, 2017 6:03 PM To: jorvis/Attributor Cc: Nadendla, Suvarna; Comment Subject: Re: [jorvis/Attributor] Problematic annotation assignments (#3)
Looking at this while I have some free time on vacation. Can you send the path to Herve's full polypeptide fasta file?
@kaboludehttps://github.com/kabolude, thanks for the link to the curation script. Do you still have your source polypeptide fasta file I could test?
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/jorvis/Attributor/issues/3#issuecomment-314910665, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMlxTx-rwgOl5BHBzAtHLSDGNnshCUAAks5sNUKVgaJpZM4Lu1Yo.
As an update, most of the rules from curate_common_names.pl have now been integrated as a method in the biocode.annotation module.
https://github.com/jorvis/biocode/commit/c96bd818eb678735a0030705d2d6f8e5c67f3c87
1: All annotations that match rapsearch2uniref100trusted_full_partial are being annotated as 'hypothetical protein domain protein' and gene symbol set to ’None’ Where rapsearch2uniref100trusted_full_partial is defined as:
2: class:trusted may not be pulling actual ‘trusted’ matches Example: Set product name to 'hypothetical protein domain protein' from rapsearch2uniref100trusted_full_partial hit to UniRef100_UPI00037D6DCF hypothetical protein n=1 Tax=Brevibacillus laterosporus RepID=UPI00037D6DCF “hypothetical protein” shouldn’t be “trusted”
3: Set default GO annotation to GO:0008150,GO:0003674,GO:0005575
4: It doesn’t seem like gene symbols are added
5: Lowercase the beginning of all names (except abbreviations)
6: No matches using ‘rapsearch2uniref100trusted_full_full’ and ‘rapsearch2uniref100trusted_full_partial’ could it be that percent_identity_cutoff: 40% is limiting all hits?
7: Post-assignment name processing …family protein family protein => …family protein …family transporter protein family protein => …transporter family protein …family family protein => …family protein …domain family protein => …domain protein …domain domain protein => …domain protein …Protein family protein => …family protein …protein domain protein => …domain protein Domain of Unknown Function… => “conserved hypothetical protein” possibly incorporate rules in /usr/local/projects/ergatis/package-latest/bin/curate_common_names.pl