Closed ignadb closed 6 years ago
Hi @ignadb, that "error" isn't something to worry about. It is telling you that at least 1 exonerate alignment failed -- this is usually because of an incompatibility with one of the UniProtKb/SwissProt proteins and exonerate. But everything else seems to look okay to me. You might improve annotation by passing some --transcript_evidence
-- if you don't have RNA-seq, these can also be EST's from closely related organisms (for fungal species I typically download the clustered EST sequences from JGI mycocosm for the entire family, cluster them, and then use those as transcript evidence)
Thanks so much for your quick reply and suggestion! :) At the moment, I am using RNA-Seq data from one of the species I am working with to build gene models for Augustus and use the gene models with other species (without using the --transcript_evidence flag). The good thing is that all the species I am working with are in the same genus. What do you think if I assembly the RNA-Seq data de novo and use it with --transcript_evidence for other species?
I would not use the ab-initio gene models (i.e. from Augustus or even a funannotate run) to use as evidence for another species - as these are a prediction and you might end up over-fitting the training parameters. You should get better results by training Augustus for each species based on alignment of real evidence (i.e. well curated proteins and/or transcripts). Even closely related species can have different Augustus parameters. If you don't have RNA-seq for a species, funannotate defaults to using BUSCO2 predictions and any mapped evidence to train Augustus - typically this works well. You can use the trinity transcripts from one species to align to another - the threshold for mapping evidence is a percent identity of 80%, so some transcripts may not map, but the conserved ones likely should (which are the ones you want for training anyway). You can also pass multiple transcripts at runtime by separating the files by a space, i.e. --transcript_evidence trinity.fasta myESTs.fa
.
Sorry for putting you too many questions; I just wanted to make it clear. I understand that the trinity transcripts you talked about are the de novo-assembled transcripts that I have for one of the species I am working with. Is this correct? And if so, in my case it is better to feed the trinity transcripts in --transcript_evidence and let --augustus_species free. Please correct me if I am wrong.
Yes, when --augustus_species
is not defined, then it is generated as a combination of the --species
parameter and --isolate
or --species
. For example:
-s "Aspergillus fumigatus" --isolate AB1234
Would result in the script training Augustus and will use aspergillus_fumigatus_AB1234
as the new species name in the Augustus config folder.
This will then run BUSCO2 mediated training that is supplemented with the protein/transcript alignments. You can additionally use your RNA-seq trained species as a "seed species" for running BUSCO, you would do that by passing the Augustus training species name to --busco_seed_species
. This will then use the --busco_seed_species
as the initial training parameters for BUSCO2 and it will then update those parameters for the new species it is training.
Ah, okay. Thank you very much Jon!
Note you can also use the RNA-seq modules in funannotate (funannotate train) - and you should probably upgrade to the newest version if you are able to as there are bug fixes and some better functionality.
I will talk with our IT support next week and ask them for the newest upgrade. :) Thanks so much again Jon! 👍
v1.4.1 released
Hi Jon,
Thanks for funannotate and for your continuing support!
I was running version 1.1.1 to annotate a fungal genome. It went well with beautiful results, but I got one warning which I would like to have your opinion. I listed the command I used and reports from funannotate below. At [08:05:27 AM], it informed that it finished exonerate before instantly throwing a flag that failed exonerate alignments found. Do you think the generated results are still okay? Or is there anything I should be aware of?
Thank you very much and looking forward to hearing from you.