Open laurahspencer opened 1 year ago
Where can I find the transcriptome?
@ggoetznoaa can you get Sam the transcriptome so he can run it through is annotation pipeline?
@laurahspencer I thought it was in the github repository? I see it in
results/salmon/transcriptome
Its the fasta.gz file.
I don't see it in there (see screen shot)- perhaps you see it on your computer locally but couldn't upload it to GitHub b/c the file is too large (>100MB)?
Ok, not sure exactly why it wasn't there, the file was only 35 MB. I did end up having to use Sedna to add the file, my laptop's git install is broken somehow (just had the laptop updated). I also got a warning saying the file was ignored because of .gitignore. Anyways, the file should be in that folder now.
@ggoetznoaa - @sr320 has encountered the issue with a Mac update breaking Git (and, he has to deal with it again, since the most recent update). This might save you some pain:
https://github.com/RobertsLab/resources/issues/360#issuecomment-417395799
TransDecoder/BLASTx/Trinotate annotation complete:
Notebook:
Jump to the RESULTS to see the output files.
Three's a dedicated GO annotations file and then a full annotation file, which contains the results of all the various tools used for annotations (e.g. BLASTp, RNAmmer, pfam, BLASTx, hmmscan, etc).
And, if you're savvy, there's also a SQlite database that has all the results.
@kubu4 I'm now using the annotation report you generated for the Dungeness crab transcriptome. I'm using it to perform functional/enrichment analyses of differentially expressed genes in DAVID using Uniprot Accession. I want to use the most comprehensive set of Uniprot Accessions possible, so want to make sure my approach makes sense
I see that you ran both blastx and blastp; genes with blastx hits have Uniprot Accessions in the annotation report, which I can use directly. Other genes without blastx hits do have blastp hits, and while the annotation report doesn't have Uniprot Accessions it does have Uniprot Entry Names which I can upload to Uniprot.com, pull Accession numbers, and add them to the annotation report (in R). Does this make sense? Do you have an easier way to get Uniprot Accessions for as many genes as possible?
Thanks!
I'll look into this.
~The shortcoming lies in the default blastp
output format 6.~
~For some reason, the default set of columns differs in blastp
output from other common BLAST default outputs (e.g. blastn
and blastx
both include subject IDs in their default outputs for format 6).~
I'll dive a bit into the Trinotate documentation (this is the software that creates that annotation report) and see if I can figure out whether a "customized" blastp
output can be incorporated into the final annotation table (I suspect the answer is "yes").
If that's the case, then I'll just re-run blastp
and incorporate the customized output format into a new version of that annotation table.
EDITED: Added strikethrough to incorrect info.
cool, thanks for looking into it!
Sorry to take so long on this.
Anyway, I've figured out the issue and, possible, how to solve this.
The issue is caused by the BLASTp database which Trinotate uses. The peptide BLAST database it's using does not contain the SwissProt IDs in the source FastA header. Thus, SPIDs aren't used for generating the final annotation report file (since they aren't present).
I believe the solution will be to create the BLASTp data base myself, using the full Uni/SwissProt protein FastA. I've done a quick BLASTp test against this "custom" BLASTp database and the results contain the expeccted SwiissProt IDs in column 2 of the output file.
I'll try to tackle this soon and report back.
Need Uniprot IDs in output for GO enrichment analysis