Open Niohuruzh opened 2 years ago
Seems like either you aren't specifying the path to that file correctly, it is telling you the file doesn't exist in how you passed it on the command line. Or it is not a FASTA format and unable to be opened by biopython.
Adding protein evidence is unlikely to change a single gene prediction.
Hi Jon Palmer
Finally, I solve the problem. Due to the illegal header of the .fasta
file, it will show ValueError: invalid literal for int() with base 10"
BTW, when I only use two protein .fasta
files from NCBI to predict genes, the target gene I mentioned before was correctly predicted. So whether running with species_specific protein library instead of only using uniprot_sprot.fasta
is a better way to predict?
Hi @Niohuruzh. As with most everything, it depends on what you are trying to do. If you are trying to liftover genes from perhaps a public annotation from the same species to a new isolate -- than don't use funannotate, use something like Liftoff.
For de novo annotation with funannotate, generally you should not use protein models from existing annotations unless there is experimental evidence for those gene models (this is why the default is to use uniprot/swissprot). The reason is that those predictions were most likely made with similar gene prediction algorithms/software -- which are of course a prediction and not actual evidence. So you should not reinforce selection of gene models based on other ab initio predictions -- just because a computer predicted a gene model 10 years ago in your organism of interest, doesn't mean it is actually correct unless it has been experimentally validated.
Hi Jon Palmer
Thanks for your help. Now I totally understand what funannotate
is good at. And I'll try Lifttoff
to annotate the gene.
Have a good day Best regards!
Hi Jon Palmer When I use
funannotate predict
function with command--protein_evidence A.protein.fasta
. But, there is an error indicated thatA.protein.fasta is not valid, existing
. And theA.protein.fasta
was downloaded from NCBI. So how to use the correctclosely_related.fasta
from the example you described on https://funannotate.readthedocs.io/en/lastest/evidence.htmlBTW, the reason I want to use
--protein_evidence
is that I found one gene was not predicted, which existing in .gff on NCBI. Therefore, I doubt that only uniprot_sprot.fasta is not enough.Looking forward your reply Best regards!