Closed DrNavi closed 1 year ago
did you provide the --name
option to predict or annotate step, that sets the LOCUS prefix - typically you register your project at NCBI / EMBL to get a BioProject and a locus prefix for your organism. This is what you provide here.
For the functional assignment did you run the annotate
step - that assigned predicted function based on inferred homology.
I didn't provide the locus tag and I haven't register my project to NCBI yet.
I did the functional annotation and got some go terms which eventually I will be able to decode to gene .protein names. But still 99% of my proteins are still named hypothetical proteins. If I take the sequence and blast it to NCBI they give me a hit to magnaporth grisea with 99% similarity but why are those names not being assigned in at the annotation step?
This is covered many times in the issues. Generally this is the expected behavior due to funannotate being geared toward genome submission at NCBI, which in order to have a valid product defline (ie not "hypothetical protein") you need to have a valid common gene name. That is difficult to do and therefore the default is to be conservative. If you run EggNog Mapper you will get a few more descriptions. Here is one comment from one of these threads. https://github.com/nextgenusfs/funannotate/issues/445#issuecomment-652018889
I have used funannotate for annotation of my genome. my genome is magnaporth oryzea, its annotation are not available yet but very close specie magnaporth grisea is present in augustus speicie list. I was able to run it error free bur the problem is all genes predicted are coming with the name FUN_0001 etc and all proteins are hypothetical protein. Why is it so ?? my genome assembly stats are fine. Genome is around 43 MB and assembly comprises of 11 contigs.